{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# HTML解析简介\n",
    "\n",
    "*  本周主要内容：HTML解析（parse HTML）及Xpath实践\n",
    "*  20春_Web数据挖掘_week02\n",
    "*  电子讲义原设计者：廖汉腾, 许智超\n",
    "*  电子讲义练习改写者：XXX\n",
    "  * 本周电子讲义互评工作坊，依序做以下动作：\n",
    "     * 在e.nfu.edu.cn下载此文档\n",
    "     * 在自己本地端实操，把ipynb文档中的123456789改名为学号\n",
    "     * <mark>在还没有改动之前，先把此后缀为学号ipynb文档上传至Github为第一版</mark> (其它文档不计)\n",
    "     * 在自己本地端实操，练习所有内容，按需增减本讲义内容，含代码丶markdown丶新数据(含其连结)及\n",
    "     * 及格至少要做: <mark>**\"本周小结内容\" 以markdown语法，按上课及本电子讲义补充内容进行150-500字的摘要说明**</mark>，可利用HTML文内超连结连到同文档其他的笔记内容\n",
    "     * 互评时会要求提交自己文档的改动比较，以方便同学观看你的改动范围及内容\n",
    "  * 本周加分项，以抢快为主，<mark>1人最多只能抢1项</mark>，需以指定的url进行数据挖掘并输出excel，抢快时间<mark>首先</mark>以该代码在Github的提交时间为准，若两人Github提交时间相差不到3分钟，则以<mark>再以</mark>QQ群@老师时间为判断\n",
    "      * C-3 期末总分加1\n",
    "      * C-4 期末总分加2\n",
    "      * C-5 期末总分加5\n",
    "-----\n",
    "![for humans](https://requests-html.kennethreitz.org/_static/requests-html-logo.png)\n",
    "\n",
    "## 复习\n",
    "\n",
    "复习：上周内容，总观使用\n",
    "\n",
    "* requests-html  丶\n",
    "* pd.read_html 丶及\n",
    "* requests + lxml \n",
    "\n",
    "的Web数据挖掘内容，最主要包括以下前后的主要数据挖掘内容\n",
    "\n",
    "1. 使用 HTTP 发送请求（HTTP request）\n",
    "2. 判断 HTTP 及状态（HTTP status code） 及 HTTP 响应（HTTP response）是否正常\n",
    "3. 执行 HTML 解析（parse HTML），通常使用 xpath or CSS selector 选择器\n",
    "\n",
    "<br/>\n",
    "<br/>\n",
    "\n",
    "-----\n",
    "![Xpath Axis](http://krum.rz.uni-mannheim.de/inet-2005/images/xpath-axis.gif)\n",
    "\n",
    "\n",
    "## 本周内容及学习目标\n",
    "\n",
    "本周内容聚焦在第3.部分\n",
    "挑选比较容易Web数据挖掘的网页（i.e. 比较没有以上1. 及2. 的坑），学习解决以下挑战：\n",
    "\n",
    "1. 使用 requests-html 爬取并存取网页文字档，查找[requests-html 中文文档](https://cncert.github.io/requests-html-doc-cn/#/)\n",
    "2. 熟悉 [xpath 语法](https://www.w3cschool.cn/xpath/xpath-syntax.html)丶[xpath 节点](https://www.w3cschool.cn/xpath/xpath-nodes.html)\n",
    "3. 使用 [xpath cheatsheet](https://devhints.io/xpath)\n",
    "  * 在 Chrome Inspector 使用\n",
    "  * 在 requests-html (Python) 使用\n",
    "4. 简易使用 [pd.DataFrame]()\n",
    "\n",
    "学生将实践\n",
    "* 解析简单HTML页面\n",
    "* 使用xpath（不挑greedy vs. 及挑剔ungreedy的策略）\n",
    "* 获取标签tags丶属性attributes丶值values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>\n",
       "/* 本电子讲义使用之CSS */\n",
       "div.code_cell {\n",
       "    background-color: #e5f1fe;\n",
       "}\n",
       "div.cell.selected {\n",
       "    background-color: #effee2;\n",
       "    font-size: 2rem;\n",
       "    line-height: 2.4rem;\n",
       "}\n",
       "div.cell.selected .rendered_html table {\n",
       "    font-size: 2rem !important;\n",
       "    line-height: 2.4rem !important;\n",
       "}\n",
       ".rendered_html pre code {\n",
       "    background-color: #C4E4ff;   \n",
       "    padding: 2px 25px;\n",
       "}\n",
       ".rendered_html pre {\n",
       "    background-color: #99c9ff;\n",
       "}\n",
       "div.code_cell .CodeMirror {\n",
       "    font-size: 2rem !important;\n",
       "    line-height: 2.4rem !important;\n",
       "}\n",
       ".rendered_html img, .rendered_html svg {\n",
       "    max-width: 35%;\n",
       "    height: auto;\n",
       "    float: right;\n",
       "}\n",
       "/* Gradient transparent - color - transparent */\n",
       "hr {\n",
       "    border: 0;\n",
       "    border-bottom: 1px dashed #ccc;\n",
       "}\n",
       ".emoticon{\n",
       "    font-size: 5rem;\n",
       "    line-height: 4.4rem;\n",
       "    text-align: center;\n",
       "    vertical-align: middle;\n",
       "}\n",
       "</style>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%%html\n",
    "<style>\n",
    "/* 本电子讲义使用之CSS */\n",
    "div.code_cell {\n",
    "    background-color: #e5f1fe;\n",
    "}\n",
    "div.cell.selected {\n",
    "    background-color: #effee2;\n",
    "    font-size: 2rem;\n",
    "    line-height: 2.4rem;\n",
    "}\n",
    "div.cell.selected .rendered_html table {\n",
    "    font-size: 2rem !important;\n",
    "    line-height: 2.4rem !important;\n",
    "}\n",
    ".rendered_html pre code {\n",
    "    background-color: #C4E4ff;   \n",
    "    padding: 2px 25px;\n",
    "}\n",
    ".rendered_html pre {\n",
    "    background-color: #99c9ff;\n",
    "}\n",
    "div.code_cell .CodeMirror {\n",
    "    font-size: 2rem !important;\n",
    "    line-height: 2.4rem !important;\n",
    "}\n",
    ".rendered_html img, .rendered_html svg {\n",
    "    max-width: 35%;\n",
    "    height: auto;\n",
    "    float: right;\n",
    "}\n",
    "/* Gradient transparent - color - transparent */\n",
    "hr {\n",
    "    border: 0;\n",
    "    border-bottom: 1px dashed #ccc;\n",
    "}\n",
    ".emoticon{\n",
    "    font-size: 5rem;\n",
    "    line-height: 4.4rem;\n",
    "    text-align: center;\n",
    "    vertical-align: middle;\n",
    "}\n",
    "</style>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 基本模块\n",
    "import pandas as pd\n",
    "from requests_html import HTMLSession"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# requsts-html\n",
    "学生将实践\n",
    "* 解析简单HTML页面\n",
    "\n",
    "使用 requests-html 爬取并存取网页文字档，查找[requests-html 中文文档](https://cncert.github.io/requests-html-doc-cn/#/)\n",
    "\n",
    "* API 文档\n",
    "  * HTML类\n",
    "  * Element类\n",
    "  * HTML Sessions (应正名为HTTP Sessions)  \n",
    "* [原文档](https://requests-html.kennethreitz.org//_modules/requests_html.html)\n",
    "\n",
    "要点：HTTP 和 HTML 的分工与合作"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## HTML类\n",
    "\n",
    "HTML文本的基本使用及保存备用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A1  nfu.edu.cn 搜 文学与传媒学院 保存备用\n",
    "payload = {\n",
    "    \"keyword\":\"文学与传媒学院\",\n",
    "    \"p\":\"1\"\n",
    "}\n",
    "\n",
    "session = HTMLSession()\n",
    "r = session.get(\"http://www.nfu.edu.cn/index.php/home/article/search.html\", params=payload)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'\\ufeff<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\\r\\n<html>\\r\\n<head>\\r\\n<meta name=\"renderer\" content=\"webkit\">\\r\\n<meta http-equiv=\"x-ua-compatible\" content=\"IE=edge\" >\\r\\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\\r\\n<title>-中山大学南方学院 </title>\\r\\n<meta name=\"keywords\" content=\"\">\\r\\n<meta name=\"description\" content=\"\">\\r\\n\\t\\n<link rel=\"stylesheet\" type=\"text/css\" href=\"/Public/Home/css/swiper-3.3.1.min.css\"/>\\n\\t\\t<link href=\"/Public/Home/css/lin.css\" rel=\"stylesheet\" type=\"text/css\" />\\n\\n\\t\\t<script src=\"/Public/Home/js/jquery-1.11.3.min.js\"></script>\\n\\t\\t<script src=\"/Public/Home/js/jquery-1.11.1.js\"></script>\\n\\t\\t<script src=\"/Public/Home/js/jquery.easie-min.js\" type=\"text/javascript\"></script>\\n\\t\\t<script src=\"/Public/Home/js/swiper.min.js\" type=\"text/javascript\"></script>\\n\\t\\t<script src=\"/Public/Home/js/lin.js\"></script>\\n\\t\\t\\n\\t\\t\\n<link href=\"/Public/Home/page.css\" rel=\"stylesheet\" type=\"text/css\" />\\n<link href=\"/Public/favicon.ico\" rel=\"Shortcut Icon\">\\n<link href=\"/Public/favicon.ico\" rel=\"Bookmark\">\\n\\r\\n\\t</head>\\r\\n<body>\\r\\n\\ufeff<!--头部-->\\n\\t\\t<div class=\"lin-header \">\\n\\t\\t\\t<div class=\"lin-head clearfix\">\\n\\t\\t\\t\\t<h1 class=\"lin-topl\"><a href=\"/index.php\" target=\"_blank\" title=\"中山大学南方学院\"><img src=\"/Public/Home/images/logo.png\"/></a></h1>\\n\\t\\t\\t\\t<div class=\"lin-topr\">\\n\\t\\t\\t\\t\\t<div class=\"lin-youxiang\">\\n\\t\\t\\t\\t\\t\\t<a href=\"http://oa.nfu.edu.cn/\" target=\"_blank\">办公系统</a>\\n\\t\\t\\t\\t\\t\\t<a href=\"http://en.nfu.edu.cn/\">English Version</a>\\n\\t\\t\\t\\t\\t\\t<!-- <a href=\"https://mail.nfu.edu.cn/\" target=\"_blank\">邮箱登录</a>\\n\\t\\t\\t\\t\\t\\t<a href=\"mailto:nfcsysuyz@126.com\" target=\"_blank\" title=\"nfcsysuyz@126.com\" >院长信箱</a> -->\\n\\t\\t\\t\\t\\t</div>\\n\\t\\t\\t\\t\\t<div class=\"lin-ser lin-serhide\">\\n\\t\\t\\t\\t\\t\\t<div class=\"serbox\">\\n\\t\\t\\t\\t\\t\\t<form action=\"/index.php/home/article/search.html\" method=\"get\" id=\"search_form\">\\n\\t\\t\\t\\t\\t\\t\\t<input type=\"text\" name=\"keyword\" id=\"keyword\" placeholder=\"搜索\" />\\n\\t\\t\\t\\t\\t\\t\\t<a href=\"javascript:;\" id=\"search_btn\" ></a>\\n\\t\\t\\t\\t\\t\\t</form>\\t\\n\\t\\t\\t\\t\\t\\t<script type=\"text/javascript\">\\n\\t\\t\\t\\t\\t\\t\\t$(\"#search_btn\").click(function(){\\n\\t\\t\\t\\t\\t\\t\\t\\tvar keyword=$(\"#keyword\").val();\\n\\t\\t\\t\\t\\t\\t\\t\\tif(keyword==\\'\\'){\\n\\t\\t\\t\\t\\t\\t\\t\\t\\talert(\\'* 请输入搜索关键词 !\\');\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t$(\"#keyword\").focus();\\n\\t\\t\\t\\t\\t\\t\\t\\t\\treturn false;\\n\\t\\t\\t\\t\\t\\t\\t\\t}else{\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t$(\"#search_form\").submit();\\n\\t\\t\\t\\t\\t\\t\\t\\t}\\n\\t\\t\\t\\t\\t\\t\\t})\\n\\t\\t\\t\\t\\t\\t</script>\\n\\t\\t\\t\\t\\t\\t</div>\\n\\t\\t\\t\\t\\t\\t<!-- <span class=\"ser-biaoti\"><a href=\\'\\' style=\"color:#fff;\">English Version</a></span> -->\\n\\t\\t\\t\\t\\t</div>\\n\\t\\t\\t\\t</div>\\n\\t\\t\\t</div>\\n\\t\\t</div>\\n\\t\\t<!-- end 头部-->\\n\\t\\t<!--导航条-->\\n\\t\\t<div class=\"lin-navbar\">\\n\\t\\t\\t<p class=\"navnav\">\\n\\t\\t\\t\\t<span></span>\\n\\t\\t\\t\\t<span></span>\\n\\t\\t\\t\\t<span></span>\\n\\t\\t\\t</p>\\n\\t\\t\\t<ul class=\"lin-nav clearfix\">\\n\\t\\t\\t\\t<li  class=\"lin-navli\"><a href=\"/index.php\">首页</a>\\n\\t\\t\\t\\t</li>\\n\\t\\t\\t\\t<li class=\"lin-navli\"><a href=\"/index.php/home/article/index/cid/29.html\"  target=\"_blank\">学校概况</a>\\n\\n\\t\\t\\t\\t\\t<!-- <i f condition=\"!empty($nav[\\'son_list\\']) and $nav[id] !=3 and  $nav[id] !=4 and $nav[id] !=5 and $nav[id] !=89\"> -->\\n\\t\\t\\t\\t\\t<div class=\"lin-navdiv\">\\n\\t\\t\\t\\t\\t\\t<div class=\"sonnav-bg\">\\n\\t\\t\\t\\t\\t\\t\\t<ul class=\"nav-conul clearfix\">\\n\\t\\t\\t\\t\\t\\t\\t\\t<li><a href=\"/index.php/home/article/index/cid/29.html\" target=\"_self\">学校简介</a></li><li><a href=\"/index.php/home/article/index/cid/30.html\" target=\"_blank\">现任领导</a></li><li><a href=\"/index.php/home/article/index/cid/135.html\" target=\"_self\">校徽  校训  校歌</a></li><li><a href=\"/index.php/home/article/index/cid/34.html\" target=\"_blank\">南方大事记</a></li><li><a href=\"/index.php/home/article/index/cid/104.html\" target=\"_self\">学校校历</a></li>\\t\\t\\t\\t\\t\\t\\t</ul>\\n\\t\\t\\t\\t\\t\\t</div>\\n\\t\\t\\t\\t\\t</div>\\t\\t\\t\\t</li><li class=\"lin-navli\"><a href=\"/index.php/home/article/index/cid/2.html\"  target=\"_self\">党建之窗</a>\\n\\n\\t\\t\\t\\t\\t<!-- <i f condition=\"!empty($nav[\\'son_list\\']) and $nav[id] !=3 and  $nav[id] !=4 and $nav[id] !=5 and $nav[id] !=89\"> -->\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t</li><li class=\"lin-navli\"><a href=\"/index.php/home/article/index/cid/61.html\"  target=\"_blank\">机构设置</a>\\n\\n\\t\\t\\t\\t\\t<!-- <i f condition=\"!empty($nav[\\'son_list\\']) and $nav[id] !=3 and  $nav[id] !=4 and $nav[id] !=5 and $nav[id] !=89\"> -->\\n\\t\\t\\t\\t\\t<div class=\"lin-navdiv\">\\n\\t\\t\\t\\t\\t\\t<div class=\"sonnav-bg\">\\n\\t\\t\\t\\t\\t\\t\\t<ul class=\"nav-conul clearfix\">\\n\\t\\t\\t\\t\\t\\t\\t\\t<li><a href=\"/index.php/home/article/index/cid/61.html\" target=\"_self\">院系设置</a></li><li><a href=\"/index.php/home/article/index/cid/36.html\" target=\"_self\">管理机构</a></li><li><a href=\"/index.php/home/article/index/cid/165.html\" target=\"_self\">常设委员会</a></li>\\t\\t\\t\\t\\t\\t\\t</ul>\\n\\t\\t\\t\\t\\t\\t</div>\\n\\t\\t\\t\\t\\t</div>\\t\\t\\t\\t</li><li class=\"lin-navli\"><a href=\"/index.php/home/article/index/cid/31.html\"  target=\"_blank\">人才培养</a>\\n\\n\\t\\t\\t\\t\\t<!-- <i f condition=\"!empty($nav[\\'son_list\\']) and $nav[id] !=3 and  $nav[id] !=4 and $nav[id] !=5 and $nav[id] !=89\"> -->\\n\\t\\t\\t\\t\\t<div class=\"lin-navdiv\">\\n\\t\\t\\t\\t\\t\\t<div class=\"sonnav-bg\">\\n\\t\\t\\t\\t\\t\\t\\t<ul class=\"nav-conul clearfix\">\\n\\t\\t\\t\\t\\t\\t\\t\\t<li><a href=\"/index.php/home/article/index/cid/31.html\" target=\"_blank\">名师介绍</a></li><li><a href=\"/index.php/home/article/index/cid/163.html\" target=\"_self\">本科教育</a></li><li><a href=\"/index.php/home/article/index/cid/164.html\" target=\"_self\">继续教育</a></li>\\t\\t\\t\\t\\t\\t\\t</ul>\\n\\t\\t\\t\\t\\t\\t</div>\\n\\t\\t\\t\\t\\t</div>\\t\\t\\t\\t</li><li class=\"lin-navli\"><a href=\"/index.php/home/article/index/cid/106.html\"  target=\"_blank\">教学科研</a>\\n\\n\\t\\t\\t\\t\\t<!-- <i f condition=\"!empty($nav[\\'son_list\\']) and $nav[id] !=3 and  $nav[id] !=4 and $nav[id] !=5 and $nav[id] !=89\"> -->\\n\\t\\t\\t\\t\\t<div class=\"lin-navdiv\">\\n\\t\\t\\t\\t\\t\\t<div class=\"sonnav-bg\">\\n\\t\\t\\t\\t\\t\\t\\t<ul class=\"nav-conul clearfix\">\\n\\t\\t\\t\\t\\t\\t\\t\\t<li><a href=\"/index.php/home/article/index/cid/106.html\" target=\"_blank\">教务与科研部</a></li><li><a href=\"/index.php/home/article/index/cid/127.html\" target=\"_blank\">科研信息与动态</a></li><li><a href=\"/index.php/home/article/index/cid/107.html\" target=\"_blank\">科研机构</a></li>\\t\\t\\t\\t\\t\\t\\t</ul>\\n\\t\\t\\t\\t\\t\\t</div>\\n\\t\\t\\t\\t\\t</div>\\t\\t\\t\\t</li><li class=\"lin-navli\"><a href=\"/index.php/home/article/index/cid/49.html\"  target=\"_blank\">招生就业</a>\\n\\n\\t\\t\\t\\t\\t<!-- <i f condition=\"!empty($nav[\\'son_list\\']) and $nav[id] !=3 and  $nav[id] !=4 and $nav[id] !=5 and $nav[id] !=89\"> -->\\n\\t\\t\\t\\t\\t<div class=\"lin-navdiv\">\\n\\t\\t\\t\\t\\t\\t<div class=\"sonnav-bg\">\\n\\t\\t\\t\\t\\t\\t\\t<ul class=\"nav-conul clearfix\">\\n\\t\\t\\t\\t\\t\\t\\t\\t<li><a href=\"/index.php/home/article/index/cid/49.html\" target=\"_blank\">本科招生</a></li><li><a href=\"/index.php/home/article/index/cid/129.html\" target=\"_self\">继续教育</a></li><li><a href=\"/index.php/home/article/index/cid/50.html\" target=\"_blank\">就业服务</a></li>\\t\\t\\t\\t\\t\\t\\t</ul>\\n\\t\\t\\t\\t\\t\\t</div>\\n\\t\\t\\t\\t\\t</div>\\t\\t\\t\\t</li><li class=\"lin-navli\"><a href=\"/index.php/home/article/index/cid/79.html\"  target=\"_blank\">图书馆</a>\\n\\n\\t\\t\\t\\t\\t<!-- <i f condition=\"!empty($nav[\\'son_list\\']) and $nav[id] !=3 and  $nav[id] !=4 and $nav[id] !=5 and $nav[id] !=89\"> -->\\n\\t\\t\\t\\t\\t<div class=\"lin-navdiv\">\\n\\t\\t\\t\\t\\t\\t<div class=\"sonnav-bg\">\\n\\t\\t\\t\\t\\t\\t\\t<ul class=\"nav-conul clearfix\">\\n\\t\\t\\t\\t\\t\\t\\t\\t<li><a href=\"/index.php/home/article/index/cid/79.html\" target=\"_blank\">图书馆</a></li><li><a href=\"/index.php/home/article/index/cid/80.html\" target=\"_blank\">档案室</a></li>\\t\\t\\t\\t\\t\\t\\t</ul>\\n\\t\\t\\t\\t\\t\\t</div>\\n\\t\\t\\t\\t\\t</div>\\t\\t\\t\\t</li><li class=\"lin-navli\"><a href=\"/index.php/home/article/index/cid/159.html\"  target=\"_blank\">合作交流</a>\\n\\n\\t\\t\\t\\t\\t<!-- <i f condition=\"!empty($nav[\\'son_list\\']) and $nav[id] !=3 and  $nav[id] !=4 and $nav[id] !=5 and $nav[id] !=89\"> -->\\n\\t\\t\\t\\t\\t<div class=\"lin-navdiv\">\\n\\t\\t\\t\\t\\t\\t<div class=\"sonnav-bg\">\\n\\t\\t\\t\\t\\t\\t\\t<ul class=\"nav-conul clearfix\">\\n\\t\\t\\t\\t\\t\\t\\t\\t<li><a href=\"/index.php/home/article/index/cid/159.html\" target=\"_blank\">国际交流</a></li><li><a href=\"/index.php/home/article/index/cid/161.html\" target=\"_blank\">外事服务</a></li>\\t\\t\\t\\t\\t\\t\\t</ul>\\n\\t\\t\\t\\t\\t\\t</div>\\n\\t\\t\\t\\t\\t</div>\\t\\t\\t\\t</li><li class=\"lin-navli\"><a href=\"/index.php/home/article/index/cid/44.html\"  target=\"_blank\">人才招聘</a>\\n\\n\\t\\t\\t\\t\\t<!-- <i f condition=\"!empty($nav[\\'son_list\\']) and $nav[id] !=3 and  $nav[id] !=4 and $nav[id] !=5 and $nav[id] !=89\"> -->\\n\\t\\t\\t\\t\\t<div class=\"lin-navdiv\">\\n\\t\\t\\t\\t\\t\\t<div class=\"sonnav-bg\">\\n\\t\\t\\t\\t\\t\\t\\t<ul class=\"nav-conul clearfix\">\\n\\t\\t\\t\\t\\t\\t\\t\\t<li><a href=\"/index.php/home/article/index/cid/44.html\" target=\"_blank\">教师系列</a></li><li><a href=\"/index.php/home/article/index/cid/45.html\" target=\"_blank\">管理系列</a></li>\\t\\t\\t\\t\\t\\t\\t</ul>\\n\\t\\t\\t\\t\\t\\t</div>\\n\\t\\t\\t\\t\\t</div>\\t\\t\\t\\t</li><li class=\"lin-navli\"><a href=\"/index.php/home/article/index/cid/32.html\"  target=\"_blank\">走进南方</a>\\n\\n\\t\\t\\t\\t\\t<!-- <i f condition=\"!empty($nav[\\'son_list\\']) and $nav[id] !=3 and  $nav[id] !=4 and $nav[id] !=5 and $nav[id] !=89\"> -->\\n\\t\\t\\t\\t\\t<div class=\"lin-navdiv\">\\n\\t\\t\\t\\t\\t\\t<div class=\"sonnav-bg\">\\n\\t\\t\\t\\t\\t\\t\\t<ul class=\"nav-conul clearfix\">\\n\\t\\t\\t\\t\\t\\t\\t\\t<li><a href=\"/index.php/home/article/index/cid/32.html\" target=\"_blank\">图说南方</a></li><li><a href=\"/index.php/home/article/index/cid/105.html\" target=\"_self\">生活服务</a></li><li><a href=\"/index.php/home/article/index/cid/87.html\" target=\"_self\">医疗服务</a></li><li><a href=\"/index.php/home/article/index/cid/51.html\" target=\"_blank\">校报</a></li><li><a href=\"/index.php/home/article/index/cid/82.html\" target=\"_self\">交通指引</a></li>\\t\\t\\t\\t\\t\\t\\t</ul>\\n\\t\\t\\t\\t\\t\\t</div>\\n\\t\\t\\t\\t\\t</div>\\t\\t\\t\\t</li>\\n\\t\\t\\t</ul>\\n\\t\\t\\t\\n\\t\\t</div>\\n\\t\\t<div class=\"lin-navbg\"></div>\\n\\n\\r\\n<div class=\"lin-content\">\\r\\n\\t\\t\\t<div class=\"lin-neiye clearfix\">\\r\\n\\t\\t\\t\\t\\r\\n\\r\\n\\t\\t\\t    <div class=\"search_list_right\">\\r\\n\\t\\t\\t        <div class=\"fan clearfix\">\\r\\n\\t\\t\\t            <span class=\"fan_title\">站内搜索</span>\\r\\n\\t\\t\\t            <span class=\"fan_right\">您当前位置是：<a href=\"/index.php\">网站首页</a> &gt; <font>站内搜索</font></span>\\r\\n\\t\\t\\t        </div>\\r\\n\\t\\t\\t        <div class=\"ny_content\">\\r\\n\\t\\t\\t\\t\\t\\t<ul class=\"list-ul\">\\r\\n\\t\\t\\t\\t\\t\\t<li><font class=\"right-more\">2020-01-06</font><div class=\"news_title\"><a href=\"/index.php/home/article/search_detail/id/6363.html\" target=\"_blank\" title=\"文学与传媒学院教师获邀参加2020年U40中澳暑期工作营及国际学术研讨会\"><font color=red>文学与传媒学院</font>教师获邀参加2020年U40中澳暑期工作营及国际学术研讨会</a></div>\\r\\n\\t\\t\\t\\t\\t\\t\\t</li><li><font class=\"right-more\">2020-01-06</font><div class=\"news_title\"><a href=\"/index.php/home/article/search_detail/id/6366.html\" target=\"_blank\" title=\"文学与传媒学院2019年学术研讨会暨总结大会顺利召开\"><font color=red>文学与传媒学院</font>2019年学术研讨会暨总结大会顺利召开</a></div>\\r\\n\\t\\t\\t\\t\\t\\t\\t</li><li><font class=\"right-more\">2019-12-20</font><div class=\"news_title\"><a href=\"/index.php/home/article/search_detail/id/6318.html\" target=\"_blank\" title=\"展现当代青年的迷惘与奋进——我校文学与传媒学院大型原创舞台剧《春至》圆满落幕\">展现当代青年的迷惘与奋进——我校<font color=red>文学与传媒学院</font>大型原创舞台剧《春至》圆满落幕</a></div>\\r\\n\\t\\t\\t\\t\\t\\t\\t</li><li><font class=\"right-more\">2019-11-22</font><div class=\"news_title\"><a href=\"/index.php/home/article/search_detail/id/6154.html\" target=\"_blank\" title=\"文学与传媒学院考研座谈暨2020年考研交流答疑会圆满结束\"><font color=red>文学与传媒学院</font>考研座谈暨2020年考研交流答疑会圆满结束</a></div>\\r\\n\\t\\t\\t\\t\\t\\t\\t</li><li><font class=\"right-more\">2019-11-05</font><div class=\"news_title\"><a href=\"/index.php/home/article/search_detail/id/5348.html\" target=\"_blank\" title=\"文学与传媒学院教师招聘启事\"><font color=red>文学与传媒学院</font>教师招聘启事</a></div>\\r\\n\\t\\t\\t\\t\\t\\t\\t</li><li><font class=\"right-more\">2019-11-04</font><div class=\"news_title\"><a href=\"/index.php/home/article/search_detail/id/6016.html\" target=\"_blank\" title=\"创意无限，未来可期——文学与传媒学院青马工程第四讲暨闭营仪式顺利举行\">创意无限，未来可期——<font color=red>文学与传媒学院</font>青马工程第四讲暨闭营仪式顺利举行</a></div>\\r\\n\\t\\t\\t\\t\\t\\t\\t</li><li><font class=\"right-more\">2019-11-04</font><div class=\"news_title\"><a href=\"/index.php/home/article/search_detail/id/6019.html\" target=\"_blank\" title=\"垃圾分类我先行——文学与传媒学院“分门别类，谁与争锋”垃圾分类趣味知识竞赛决赛顺利举行\">垃圾分类我先行——<font color=red>文学与传媒学院</font>“分门别类，谁与争锋”垃圾分类趣味知识竞赛决赛顺利举行</a></div>\\r\\n\\t\\t\\t\\t\\t\\t\\t</li><li><font class=\"right-more\">2019-09-16</font><div class=\"news_title\"><a href=\"/index.php/home/article/search_detail/id/5794.html\" target=\"_blank\" title=\"以梦为马，不负韶华——文学与传媒学院2019级新生开学典礼圆满结束\">以梦为马，不负韶华——<font color=red>文学与传媒学院</font>2019级新生开学典礼圆满结束</a></div>\\r\\n\\t\\t\\t\\t\\t\\t\\t</li><li><font class=\"right-more\">2019-09-09</font><div class=\"news_title\"><a href=\"/index.php/home/article/search_detail/id/5776.html\" target=\"_blank\" title=\"文学与传媒学院学子在全国高校数字艺术设计大赛中斩获大奖\"><font color=red>文学与传媒学院</font>学子在全国高校数字艺术设计大赛中斩获大奖</a></div>\\r\\n\\t\\t\\t\\t\\t\\t\\t</li><li><font class=\"right-more\">2019-09-09</font><div class=\"news_title\"><a href=\"/index.php/home/article/search_detail/id/5777.html\" target=\"_blank\" title=\"文学与传媒学院学子在第七届中国大学生公共关系策划大赛中喜获佳绩\"><font color=red>文学与传媒学院</font>学子在第七届中国大学生公共关系策划大赛中喜获佳绩</a></div>\\r\\n\\t\\t\\t\\t\\t\\t\\t</li><li><font class=\"right-more\">2019-06-24</font><div class=\"news_title\"><a href=\"/index.php/home/article/search_detail/id/5642.html\" target=\"_blank\" title=\"倾心之作，致敬经典——文学与传媒学院紫阳戏剧社《倾城之恋》话剧展演圆满落幕\">倾心之作，致敬经典——<font color=red>文学与传媒学院</font>紫阳戏剧社《倾城之恋》话剧展演圆满落幕</a></div>\\r\\n\\t\\t\\t\\t\\t\\t\\t</li><li><font class=\"right-more\">2019-06-24</font><div class=\"news_title\"><a href=\"/index.php/home/article/search_detail/id/5647.html\" target=\"_blank\" title=\"毕业季 | 今朝有离别，青春不散场 ——文学与传媒学院2019届毕业生毕业季系列活动有序开展\">毕业季 | 今朝有离别，青春不散场 ——<font color=red>文学与传媒学院</font>2019届毕业生毕业季系列活动有序开展</a></div>\\r\\n\\t\\t\\t\\t\\t\\t\\t</li>\\t\\t\\t\\t\\t\\t</ul>\\r\\n\\t\\t\\t\\t\\t\\t<div style=\"clear: both;\"></div>\\r\\n\\t\\t\\t\\t\\t\\t<div class=\"pages\" align=\"center\"><div>  <span class=\"current\">1</span><a class=\"num\" href=\"/index.php/home/article/search/keyword/%E6%96%87%E5%AD%A6%E4%B8%8E%E4%BC%A0%E5%AA%92%E5%AD%A6%E9%99%A2/p/2.html\">2</a><a class=\"num\" href=\"/index.php/home/article/search/keyword/%E6%96%87%E5%AD%A6%E4%B8%8E%E4%BC%A0%E5%AA%92%E5%AD%A6%E9%99%A2/p/3.html\">3</a><a class=\"num\" href=\"/index.php/home/article/search/keyword/%E6%96%87%E5%AD%A6%E4%B8%8E%E4%BC%A0%E5%AA%92%E5%AD%A6%E9%99%A2/p/4.html\">4</a><a class=\"num\" href=\"/index.php/home/article/search/keyword/%E6%96%87%E5%AD%A6%E4%B8%8E%E4%BC%A0%E5%AA%92%E5%AD%A6%E9%99%A2/p/5.html\">5</a><a class=\"num\" href=\"/index.php/home/article/search/keyword/%E6%96%87%E5%AD%A6%E4%B8%8E%E4%BC%A0%E5%AA%92%E5%AD%A6%E9%99%A2/p/6.html\">6</a><a class=\"num\" href=\"/index.php/home/article/search/keyword/%E6%96%87%E5%AD%A6%E4%B8%8E%E4%BC%A0%E5%AA%92%E5%AD%A6%E9%99%A2/p/7.html\">7</a> <a class=\"next\" href=\"/index.php/home/article/search/keyword/%E6%96%87%E5%AD%A6%E4%B8%8E%E4%BC%A0%E5%AA%92%E5%AD%A6%E9%99%A2/p/2.html\">>></a> </div></div>\\r\\n\\t\\t\\t        </div>\\r\\n\\t\\t\\t    </div>\\r\\n\\t\\t\\t</div>\\r\\n\\t\\t</div>\\r\\n\\t\\t<!-- end 内容区域-->\\r\\n\\r\\n\\t<!--底部-->\\n\\t\\t<div class=\"lin-footer\">\\n\\t\\t\\t<div class=\"lin-fer clearfix\">\\n\\t\\t\\t\\t<div class=\"ferleft\">\\n\\t\\t\\t\\t\\t<ul class=\"fer-ul clearfix\">\\n\\t\\t\\t\\t\\t\\t<li class=\"fer-li\"><a href=\"http://www.moe.gov.cn/\" target=\"_blank\" title=\"教育部\">教育部</a></li><li class=\"fer-li\"><a href=\"http://www.gz.gov.cn/\" target=\"_blank\" title=\"广州市政府\">广州市政府</a></li><li class=\"fer-li\"><a href=\"http://www.cnki.net/\" target=\"_blank\" title=\"中国知网\">中国知网</a></li><li class=\"fer-li\"><a href=\"http://edu.gd.gov.cn\" target=\"_blank\" title=\"广东省教育厅\">广东省教育厅</a></li><li class=\"fer-li\"><a href=\"http://www.gdpr.com/\" target=\"_blank\" title=\"珠江投资\">珠江投资</a></li><li class=\"fer-li\"><a href=\"http://journal.nfu.edu.cn/CN/volumn/home.shtml\" target=\"_blank\" title=\"南方论丛\">南方论丛</a></li><li class=\"fer-li\"><a href=\"http://www.sysu.edu.cn/\" target=\"_blank\" title=\"中山大学 \">中山大学 </a></li><li class=\"fer-li\"><a href=\"http://www.nfu.edu.cn/index.php/home/article/index/cid/136.html\" target=\"_blank\" title=\"珠江教育联盟\">珠江教育联盟</a></li>\\t\\n\\t\\t\\t\\t\\t\\t<li class=\"fer-li\"><a href=\"/index.php/home/article/link.html\">更多>></a></li>\\n\\t\\t\\t\\t\\t</ul>\\n\\t\\t\\t\\t</div>\\n\\t\\t\\t\\t<div class=\"fercen\">\\n\\t\\t\\t\\t\\t<div class=\"fer-er\"><img src=\"/Public/Home/images/erweima1.jpg\"/></div>\\n\\t\\t\\t\\t\\t<div class=\"fer-er\"><img src=\"/Public/Home/images/erweima2.jpg\"/></div>\\n\\t\\t\\t\\t</div>\\n\\t\\t\\t\\t<div class=\"ferright\">\\n\\t\\t\\t\\t\\t<div><p><span>地址：广州市从化区温泉大道882号中山大学南方学院</span><span>邮编：510970</span></p></div>\\n\\t\\t\\t\\t\\t<div class=\"addleft\">\\n\\t\\t\\t\\t\\t\\t<p>联系电话：020-61787326</p>\\n\\t\\t\\t\\t\\t\\t<p>版权所有 ©  中山大学南方学院</p>\\n\\t\\t\\t\\t\\t\\t<p>技术支持：<a href=\"http://www.unsun.net\" target=\"_blank\">碧辉腾乐(UNSUN.NET)</a></p>\\n\\t\\t\\t\\t\\t</div>\\n\\t\\t\\t\\t\\t<div class=\"addright\">\\n\\t\\t\\t\\t\\t\\t<p>招生咨询：020-87912619</p> \\n\\t\\t\\t\\t\\t\\t<p>\\n\\t\\t\\t\\t\\t\\t\\t<span class=\"add-spante\"><a target=\"_blank\" href=\"http://www.beian.miit.gov.cn\">粤ICP备11077779号</a></span> \\n\\t\\t\\t\\t\\t\\t\\t<span class=\"add-spante\">\\n\\t\\t\\t\\t\\t\\t\\t\\t<a href=\"/index.php/admin/index/login.html\" target=\"_blank\" >网站管理</a>&nbsp;&nbsp;\\n\\t\\t\\t\\t\\t\\t\\t\\t<a href=\"http://old.nfu.edu.cn/\" target=\"_blank\" >旧站入口</a>\\n\\t\\t\\t\\t\\t\\t\\t</span>\\n\\t\\t\\t\\t\\t\\t</p>\\n\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t</div>\\n\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t</div>\\n\\t\\t\\t\\t\\n\\t\\t\\t\\t<div align=\"center\">\\n\\t\\t\\t\\t\\t\\t<a target=\"_blank\" href=\"http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=44011702000081\" style=\"display:inline-block;text-decoration:none;height:20px;line-height:20px;\"><img src=\"/Public/Home/images/icp.png\" style=\"float:left;\"/><p style=\"float:left;height:20px;line-height:20px;margin: 0px 0px 0px 5px; color:#ffffff;\">粤公网安备 44011702000081号</p></a>\\n\\t\\t\\t\\t</div>\\n\\t\\t\\t</div>\\n\\t\\t</div>\\n\\t\\t<!-- end 底部-->\\n\\n <script type=\"text/javascript\" language=\"javascript\">\\n \\n    //加入收藏\\n \\n        function AddFavorite(sURL, sTitle) {\\n \\n            sURL = encodeURI(sURL); \\n        try{   \\n \\n            window.external.addFavorite(sURL, sTitle);   \\n \\n        }catch(e) {   \\n \\n            try{   \\n \\n                window.sidebar.addPanel(sTitle, sURL, \"\");   \\n \\n            }catch (e) {   \\n \\n                alert(\"加入收藏失败，请使用Ctrl+D进行添加,或手动在浏览器里进行设置.\");\\n \\n            }   \\n \\n        }\\n \\n    }\\n \\n    //设为首页\\n \\n    function SetHome(url){\\n \\n        if (document.all) {\\n \\n            document.body.style.behavior=\\'url(#default#homepage)\\';\\n \\n               document.body.setHomePage(url);\\n \\n        }else{\\n \\n            alert(\"您好,您的浏览器不支持自动设置页面为首页功能,请您手动在浏览器里设置该页面为首页!\");\\n \\n        }\\n \\n    }\\n \\n</script>\\n\\r\\n\\r\\n</body>\\r\\n</html>'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#  A2  nfu.edu.cn 搜 文学与传媒学院 r.html\n",
    "# r.html  (HTML 元素/标签) \n",
    "\n",
    "r.html.html  \n",
    "# 可存网页为文字档"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A3  nfu.edu.cn 搜 文学与传媒学院 保存备用\n",
    "\n",
    "with open (\"20春_Web数据挖掘_week02_nfu_文学与传媒学院.html\", encoding = \"utf8\", mode = \"w\") as fp:\n",
    "    fp.write(r.html.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A4  复习 读保存备用的任何文字档\n",
    "\n",
    "with open (\"20春_Web数据挖掘_week02_nfu_文学与传媒学院.html\", encoding = \"utf8\", mode = \"r\") as fp:\n",
    "    html_load = fp.read()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 解析文本\n",
    "![HTML head body](https://rlv.zcache.com/head_body_t_shirt-rd222a2cce3704f3b87fae4ee0fb73744_k2gm8_307.jpg)\n",
    "HTML文本的解析\n",
    "\n",
    "```python\n",
    "\n",
    "parsed = requests_html.soup_parse(html_load)\n",
    "```\n",
    "\n",
    "```python\n",
    "\n",
    "import requests_html\n",
    "# parsed = requests_html.soup_parse(html_load)\n",
    "from requests_html import soup_parse\n",
    "# parsed = soup_parse(html_load)\n",
    "```\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A5  前方高能 HTML文本的解析\n",
    "\n",
    "import requests_html\n",
    "parsed = requests_html.soup_parse(html_load)\n",
    "解析后 = requests_html.soup_parse(html_load)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Element html at 0x11e495098>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "解析后   # <html> 元素标签"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<Element body at 0x11e416188>]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "解析后.xpath('body')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<Element head at 0x11e495688>]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "解析后.xpath('head')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['办公系统',\n",
       " 'English Version',\n",
       " '首页',\n",
       " '学校概况',\n",
       " '学校简介',\n",
       " '现任领导',\n",
       " '校徽  校训  校歌',\n",
       " '南方大事记',\n",
       " '学校校历',\n",
       " '党建之窗',\n",
       " '机构设置',\n",
       " '院系设置',\n",
       " '管理机构',\n",
       " '常设委员会',\n",
       " '人才培养',\n",
       " '名师介绍',\n",
       " '本科教育',\n",
       " '继续教育',\n",
       " '教学科研',\n",
       " '教务与科研部',\n",
       " '科研信息与动态',\n",
       " '科研机构',\n",
       " '招生就业',\n",
       " '本科招生',\n",
       " '继续教育',\n",
       " '就业服务',\n",
       " '图书馆',\n",
       " '图书馆',\n",
       " '档案室',\n",
       " '合作交流',\n",
       " '国际交流',\n",
       " '外事服务',\n",
       " '人才招聘',\n",
       " '教师系列',\n",
       " '管理系列',\n",
       " '走进南方',\n",
       " '图说南方',\n",
       " '生活服务',\n",
       " '医疗服务',\n",
       " '校报',\n",
       " '交通指引',\n",
       " '网站首页',\n",
       " '教师获邀参加2020年U40中澳暑期工作营及国际学术研讨会',\n",
       " '2019年学术研讨会暨总结大会顺利召开',\n",
       " '展现当代青年的迷惘与奋进——我校',\n",
       " '大型原创舞台剧《春至》圆满落幕',\n",
       " '考研座谈暨2020年考研交流答疑会圆满结束',\n",
       " '教师招聘启事',\n",
       " '创意无限，未来可期——',\n",
       " '青马工程第四讲暨闭营仪式顺利举行',\n",
       " '垃圾分类我先行——',\n",
       " '“分门别类，谁与争锋”垃圾分类趣味知识竞赛决赛顺利举行',\n",
       " '以梦为马，不负韶华——',\n",
       " '2019级新生开学典礼圆满结束',\n",
       " '学子在全国高校数字艺术设计大赛中斩获大奖',\n",
       " '学子在第七届中国大学生公共关系策划大赛中喜获佳绩',\n",
       " '倾心之作，致敬经典——',\n",
       " '紫阳戏剧社《倾城之恋》话剧展演圆满落幕',\n",
       " '毕业季 | 今朝有离别，青春不散场 ——',\n",
       " '2019届毕业生毕业季系列活动有序开展',\n",
       " '2',\n",
       " '3',\n",
       " '4',\n",
       " '5',\n",
       " '6',\n",
       " '7',\n",
       " '>>',\n",
       " '教育部',\n",
       " '广州市政府',\n",
       " '中国知网',\n",
       " '广东省教育厅',\n",
       " '珠江投资',\n",
       " '南方论丛',\n",
       " '中山大学 ',\n",
       " '珠江教育联盟',\n",
       " '更多>>',\n",
       " '碧辉腾乐(UNSUN.NET)',\n",
       " '粤ICP备11077779号',\n",
       " '网站管理',\n",
       " '旧站入口']"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "解析后.xpath('//a/text()')  # greedy 所有<html> 元素标签"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['文学与传媒学院教师获邀参加2020年U40中澳暑期工作营及国际学术研讨会',\n",
       " '文学与传媒学院2019年学术研讨会暨总结大会顺利召开',\n",
       " '展现当代青年的迷惘与奋进——我校文学与传媒学院大型原创舞台剧《春至》圆满落幕',\n",
       " '文学与传媒学院考研座谈暨2020年考研交流答疑会圆满结束',\n",
       " '文学与传媒学院教师招聘启事',\n",
       " '创意无限，未来可期——文学与传媒学院青马工程第四讲暨闭营仪式顺利举行',\n",
       " '垃圾分类我先行——文学与传媒学院“分门别类，谁与争锋”垃圾分类趣味知识竞赛决赛顺利举行',\n",
       " '以梦为马，不负韶华——文学与传媒学院2019级新生开学典礼圆满结束',\n",
       " '文学与传媒学院学子在全国高校数字艺术设计大赛中斩获大奖',\n",
       " '文学与传媒学院学子在第七届中国大学生公共关系策划大赛中喜获佳绩',\n",
       " '倾心之作，致敬经典——文学与传媒学院紫阳戏剧社《倾城之恋》话剧展演圆满落幕',\n",
       " '毕业季 | 今朝有离别，青春不散场 ——文学与传媒学院2019届毕业生毕业季系列活动有序开展']"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "解析后.xpath('//*[@class=\"news_title\"]//a/@title')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 直接使用\n",
    "\n",
    "```python\n",
    "r.html.xpath()\n",
    "```\n",
    "\n",
    "你来试试?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['对话校友，嘉木成林——文学与传媒学院校友访谈会顺利举办', '观一代文豪，品苏诗之美——从化区作协主席刘尚阳先生来文学与传媒学院讲学', '墨香淡淡，“书”途同归——文学与传媒学院旧书捐赠活动顺利开展', '大咖行于市井，定格烟火光影——羊晚高级记者叶健强先生来文学与传媒学院精讲新闻摄影', '文学与传媒学院第三届“南方·世新”两岸两校毕业联展暨交流会顺利举行', '传播天下 “媒”梦有你——文学与传媒学院学生优秀作品展开幕式顺利举行', '不忘初心，勇攀高峰——文学与传媒学院团总支以第一名成绩喜提学校“优秀团委（团总支）”称号', '传道授业，提疑解惑——文学与传媒学院期中座谈会新闻学与网络与新媒体专场顺利召开', '文学与传媒学院团学团队参加我校校园文化艺术节闭幕式暨文艺汇演', '文学与传媒学院学生作品在南粤校园中华经典诵读文化艺术节中荣获大学组一等奖', '赛出实力，教出风采——文学与传媒学院第六届教师教学竞赛顺利举行', '文学与传媒学院首届“传播天下，‘媒’梦有你”专业文化活动月开幕式暨第十届最强讲师决赛圆满结束']\n",
      "['/index.php/home/article/search_detail/id/5648.html', '/index.php/home/article/search_detail/id/5625.html', '/index.php/home/article/search_detail/id/5632.html', '/index.php/home/article/search_detail/id/5614.html', '/index.php/home/article/search_detail/id/5584.html', '/index.php/home/article/search_detail/id/5523.html', '/index.php/home/article/search_detail/id/5508.html', '/index.php/home/article/search_detail/id/5509.html', '/index.php/home/article/search_detail/id/5511.html', '/index.php/home/article/search_detail/id/5500.html', '/index.php/home/article/search_detail/id/5455.html', '/index.php/home/article/search_detail/id/5456.html']\n",
      "['2019-06-24', '2019-06-20', '2019-06-20', '2019-06-19', '2019-06-13', '2019-05-30', '2019-05-27', '2019-05-27', '2019-05-27', '2019-05-24', '2019-05-17', '2019-05-17']\n"
     ]
    }
   ],
   "source": [
    "# A6  直接使用 requests-html\n",
    "payload = {\n",
    "    \"keyword\":\"文学与传媒学院\",\n",
    "    \"p\":\"2\"\n",
    "}\n",
    "\n",
    "session = HTMLSession()\n",
    "r = session.get(\"http://www.nfu.edu.cn/index.php/home/article/search.html\", params=payload)\n",
    "\n",
    "# 保存备用 (好习惯, 最好存一个地方)\n",
    "with open (\"20春_Web数据挖掘_week02_nfu_文学与传媒学院.html\", encoding = \"utf8\", mode = \"w\") as fp:\n",
    "    fp.write(r.html.html)\n",
    "\n",
    "# 解析文本直接使用\n",
    "print (r.html.xpath('//*[@class=\"news_title\"]/a/@title'))\n",
    "print (r.html.xpath('//*[@class=\"news_title\"]/a/@href'))\n",
    "print (r.html.xpath('//font[@class=\"right-more\"]/text()'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# xpath\n",
    "\n",
    "学生将实践\n",
    "\n",
    "* r.html 如 庖丁解牛\n",
    "* r.html.xapth() 挑牛肉吃\n",
    "  * **元素/标签如筋骨，值和文本通常才有牛肉**\n",
    "  * html以元素/标签构成\n",
    "  * 值和文本不单独存在，必需依附元素/标签\n",
    "* 使用xpath（不挑greedy vs. 及挑剔ungreedy的策略）\n",
    "\n",
    "* 获取标签tags丶属性attributes丶值values\n",
    "  * 掌握Chrome Inspector 多种颜色区分\n",
    "      * 元素/标签 elements/tags  ?色\n",
    "      * 属性attributes  ?色\n",
    "      * 值values ?色\n",
    "      * 文本 ?色\n",
    "      * HTML注解 ?色 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 取值及文本"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['文学与传媒学院教师获邀参加2020年U40中澳暑期工作营及国际学术研讨会',\n",
       " '文学与传媒学院2019年学术研讨会暨总结大会顺利召开',\n",
       " '展现当代青年的迷惘与奋进——我校文学与传媒学院大型原创舞台剧《春至》圆满落幕',\n",
       " '文学与传媒学院考研座谈暨2020年考研交流答疑会圆满结束',\n",
       " '文学与传媒学院教师招聘启事',\n",
       " '创意无限，未来可期——文学与传媒学院青马工程第四讲暨闭营仪式顺利举行',\n",
       " '垃圾分类我先行——文学与传媒学院“分门别类，谁与争锋”垃圾分类趣味知识竞赛决赛顺利举行',\n",
       " '以梦为马，不负韶华——文学与传媒学院2019级新生开学典礼圆满结束',\n",
       " '文学与传媒学院学子在全国高校数字艺术设计大赛中斩获大奖',\n",
       " '文学与传媒学院学子在第七届中国大学生公共关系策划大赛中喜获佳绩',\n",
       " '倾心之作，致敬经典——文学与传媒学院紫阳戏剧社《倾城之恋》话剧展演圆满落幕',\n",
       " '毕业季 | 今朝有离别，青春不散场 ——文学与传媒学院2019届毕业生毕业季系列活动有序开展']"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# B-A-1 使用 取值 观察xpath最后的内容\n",
    "解析后.xpath('//div[@class=\"news_title\"]/a/@title')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['教师获邀参加2020年U40中澳暑期工作营及国际学术研讨会',\n",
       " '2019年学术研讨会暨总结大会顺利召开',\n",
       " '展现当代青年的迷惘与奋进——我校',\n",
       " '大型原创舞台剧《春至》圆满落幕',\n",
       " '考研座谈暨2020年考研交流答疑会圆满结束',\n",
       " '教师招聘启事',\n",
       " '创意无限，未来可期——',\n",
       " '青马工程第四讲暨闭营仪式顺利举行',\n",
       " '垃圾分类我先行——',\n",
       " '“分门别类，谁与争锋”垃圾分类趣味知识竞赛决赛顺利举行',\n",
       " '以梦为马，不负韶华——',\n",
       " '2019级新生开学典礼圆满结束',\n",
       " '学子在全国高校数字艺术设计大赛中斩获大奖',\n",
       " '学子在第七届中国大学生公共关系策划大赛中喜获佳绩',\n",
       " '倾心之作，致敬经典——',\n",
       " '紫阳戏剧社《倾城之恋》话剧展演圆满落幕',\n",
       " '毕业季 | 今朝有离别，青春不散场 ——',\n",
       " '2019届毕业生毕业季系列活动有序开展']"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# B-A-2 使用 文本\n",
    "解析后.xpath('//div[@class=\"news_title\"]/a/text()')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "# B-A-3 該你了\n",
    "# 你是否能解釋為什麼 B1 和 B2 結果不一樣 ?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据科学家使用xpath的角度\n",
    "![](https://qxf2.com/blog/wp-content/uploads/2015/12/Table.png)\n",
    "* 不挑greedy的策略\n",
    "    * 求全求不漏\n",
    "    * 可能有垃圾\n",
    "    * 使用 // （后代descendant） 而不用  / 子女（children）\n",
    "    * 使用 **任意**元素/标签  而不用  **指定**元素/标签\n",
    "* 挑剔ungreedy的策略\n",
    "    * 求准求数据整齐\n",
    "    * 可能有漏数据\n",
    "    * 使用  / 子女（children） 而不用  // （后代descendant）\n",
    "    * 使用 **指定**元素/标签  而不用  **任意**元素/标签 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 不挑greedy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<Element 'a' href='/index.php' target='_blank' title='中山大学南方学院'>,\n",
       " <Element 'a' href='http://oa.nfu.edu.cn/' target='_blank'>,\n",
       " <Element 'a' href='http://en.nfu.edu.cn/'>,\n",
       " <Element 'a' href='javascript:;' id='search_btn'>,\n",
       " <Element 'a' href='/index.php'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/29.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/29.html' target='_self'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/30.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/135.html' target='_self'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/34.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/104.html' target='_self'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/2.html' target='_self'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/61.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/61.html' target='_self'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/36.html' target='_self'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/165.html' target='_self'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/31.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/31.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/163.html' target='_self'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/164.html' target='_self'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/106.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/106.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/127.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/107.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/49.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/49.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/129.html' target='_self'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/50.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/79.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/79.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/80.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/159.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/159.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/161.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/44.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/44.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/45.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/32.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/32.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/105.html' target='_self'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/87.html' target='_self'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/51.html' target='_blank'>,\n",
       " <Element 'a' href='/index.php/home/article/index/cid/82.html' target='_self'>,\n",
       " <Element 'a' href='/index.php'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5648.html' target='_blank' title='对话校友，嘉木成林——文学与传媒学院校友访谈会顺利举办'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5625.html' target='_blank' title='观一代文豪，品苏诗之美——从化区作协主席刘尚阳先生来文学与传媒学院讲学'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5632.html' target='_blank' title='墨香淡淡，“书”途同归——文学与传媒学院旧书捐赠活动顺利开展'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5614.html' target='_blank' title='大咖行于市井，定格烟火光影——羊晚高级记者叶健强先生来文学与传媒学院精讲新闻摄影'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5584.html' target='_blank' title='文学与传媒学院第三届“南方·世新”两岸两校毕业联展暨交流会顺利举行'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5523.html' target='_blank' title='传播天下 “媒”梦有你——文学与传媒学院学生优秀作品展开幕式顺利举行'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5508.html' target='_blank' title='不忘初心，勇攀高峰——文学与传媒学院团总支以第一名成绩喜提学校“优秀团委（团总支）”称号'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5509.html' target='_blank' title='传道授业，提疑解惑——文学与传媒学院期中座谈会新闻学与网络与新媒体专场顺利召开'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5511.html' target='_blank' title='文学与传媒学院团学团队参加我校校园文化艺术节闭幕式暨文艺汇演'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5500.html' target='_blank' title='文学与传媒学院学生作品在南粤校园中华经典诵读文化艺术节中荣获大学组一等奖'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5455.html' target='_blank' title='赛出实力，教出风采——文学与传媒学院第六届教师教学竞赛顺利举行'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5456.html' target='_blank' title='文学与传媒学院首届“传播天下，‘媒’梦有你”专业文化活动月开幕式暨第十届最强讲师决赛圆满结束'>,\n",
       " <Element 'a' class=('prev',) href='/index.php/home/article/search/keyword/%E6%96%87%E5%AD%A6%E4%B8%8E%E4%BC%A0%E5%AA%92%E5%AD%A6%E9%99%A2/p/1.html'>,\n",
       " <Element 'a' class=('num',) href='/index.php/home/article/search/keyword/%E6%96%87%E5%AD%A6%E4%B8%8E%E4%BC%A0%E5%AA%92%E5%AD%A6%E9%99%A2/p/1.html'>,\n",
       " <Element 'a' class=('num',) href='/index.php/home/article/search/keyword/%E6%96%87%E5%AD%A6%E4%B8%8E%E4%BC%A0%E5%AA%92%E5%AD%A6%E9%99%A2/p/3.html'>,\n",
       " <Element 'a' class=('num',) href='/index.php/home/article/search/keyword/%E6%96%87%E5%AD%A6%E4%B8%8E%E4%BC%A0%E5%AA%92%E5%AD%A6%E9%99%A2/p/4.html'>,\n",
       " <Element 'a' class=('num',) href='/index.php/home/article/search/keyword/%E6%96%87%E5%AD%A6%E4%B8%8E%E4%BC%A0%E5%AA%92%E5%AD%A6%E9%99%A2/p/5.html'>,\n",
       " <Element 'a' class=('num',) href='/index.php/home/article/search/keyword/%E6%96%87%E5%AD%A6%E4%B8%8E%E4%BC%A0%E5%AA%92%E5%AD%A6%E9%99%A2/p/6.html'>,\n",
       " <Element 'a' class=('num',) href='/index.php/home/article/search/keyword/%E6%96%87%E5%AD%A6%E4%B8%8E%E4%BC%A0%E5%AA%92%E5%AD%A6%E9%99%A2/p/7.html'>,\n",
       " <Element 'a' class=('next',) href='/index.php/home/article/search/keyword/%E6%96%87%E5%AD%A6%E4%B8%8E%E4%BC%A0%E5%AA%92%E5%AD%A6%E9%99%A2/p/3.html'>,\n",
       " <Element 'a' href='http://www.moe.gov.cn/' target='_blank' title='教育部'>,\n",
       " <Element 'a' href='http://www.gz.gov.cn/' target='_blank' title='广州市政府'>,\n",
       " <Element 'a' href='http://www.cnki.net/' target='_blank' title='中国知网'>,\n",
       " <Element 'a' href='http://edu.gd.gov.cn' target='_blank' title='广东省教育厅'>,\n",
       " <Element 'a' href='http://www.gdpr.com/' target='_blank' title='珠江投资'>,\n",
       " <Element 'a' href='http://journal.nfu.edu.cn/CN/volumn/home.shtml' target='_blank' title='南方论丛'>,\n",
       " <Element 'a' href='http://www.sysu.edu.cn/' target='_blank' title='中山大学 '>,\n",
       " <Element 'a' href='http://www.nfu.edu.cn/index.php/home/article/index/cid/136.html' target='_blank' title='珠江教育联盟'>,\n",
       " <Element 'a' href='/index.php/home/article/link.html'>,\n",
       " <Element 'a' href='http://www.unsun.net' target='_blank'>,\n",
       " <Element 'a' href='http://www.beian.miit.gov.cn' target='_blank'>,\n",
       " <Element 'a' href='/index.php/admin/index/login.html' target='_blank'>,\n",
       " <Element 'a' href='http://old.nfu.edu.cn/' target='_blank'>,\n",
       " <Element 'a' href='http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=44011702000081' style='display:inline-block;text-decoration:none;height:20px;line-height:20px;' target='_blank'>]"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# B-B-1 使用 xpath \n",
    "r.html.xpath('//a')  # greedy 不挑 所有 <a> 元素标签"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['中山大学南方学院',\n",
       " '文学与传媒学院教师获邀参加2020年U40中澳暑期工作营及国际学术研讨会',\n",
       " '文学与传媒学院2019年学术研讨会暨总结大会顺利召开',\n",
       " '展现当代青年的迷惘与奋进——我校文学与传媒学院大型原创舞台剧《春至》圆满落幕',\n",
       " '文学与传媒学院考研座谈暨2020年考研交流答疑会圆满结束',\n",
       " '文学与传媒学院教师招聘启事',\n",
       " '创意无限，未来可期——文学与传媒学院青马工程第四讲暨闭营仪式顺利举行',\n",
       " '垃圾分类我先行——文学与传媒学院“分门别类，谁与争锋”垃圾分类趣味知识竞赛决赛顺利举行',\n",
       " '以梦为马，不负韶华——文学与传媒学院2019级新生开学典礼圆满结束',\n",
       " '文学与传媒学院学子在全国高校数字艺术设计大赛中斩获大奖',\n",
       " '文学与传媒学院学子在第七届中国大学生公共关系策划大赛中喜获佳绩',\n",
       " '倾心之作，致敬经典——文学与传媒学院紫阳戏剧社《倾城之恋》话剧展演圆满落幕',\n",
       " '毕业季 | 今朝有离别，青春不散场 ——文学与传媒学院2019届毕业生毕业季系列活动有序开展',\n",
       " '教育部',\n",
       " '广州市政府',\n",
       " '中国知网',\n",
       " '广东省教育厅',\n",
       " '珠江投资',\n",
       " '南方论丛',\n",
       " '中山大学 ',\n",
       " '珠江教育联盟']"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# B-B-2 使用 xpath 限定 取特定属性\n",
    "# 注意和 B1 的内容相比, 是不是少了一些? \n",
    "# 没有特定属性title就不会被选到\n",
    "解析后.xpath('//a/@title')  # less greedy 有点挑 <a> 元素标签"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['中山大学南方学院',\n",
       " '对话校友，嘉木成林——文学与传媒学院校友访谈会顺利举办',\n",
       " '观一代文豪，品苏诗之美——从化区作协主席刘尚阳先生来文学与传媒学院讲学',\n",
       " '墨香淡淡，“书”途同归——文学与传媒学院旧书捐赠活动顺利开展',\n",
       " '大咖行于市井，定格烟火光影——羊晚高级记者叶健强先生来文学与传媒学院精讲新闻摄影',\n",
       " '文学与传媒学院第三届“南方·世新”两岸两校毕业联展暨交流会顺利举行',\n",
       " '传播天下 “媒”梦有你——文学与传媒学院学生优秀作品展开幕式顺利举行',\n",
       " '不忘初心，勇攀高峰——文学与传媒学院团总支以第一名成绩喜提学校“优秀团委（团总支）”称号',\n",
       " '传道授业，提疑解惑——文学与传媒学院期中座谈会新闻学与网络与新媒体专场顺利召开',\n",
       " '文学与传媒学院团学团队参加我校校园文化艺术节闭幕式暨文艺汇演',\n",
       " '文学与传媒学院学生作品在南粤校园中华经典诵读文化艺术节中荣获大学组一等奖',\n",
       " '赛出实力，教出风采——文学与传媒学院第六届教师教学竞赛顺利举行',\n",
       " '文学与传媒学院首届“传播天下，‘媒’梦有你”专业文化活动月开幕式暨第十届最强讲师决赛圆满结束',\n",
       " '教育部',\n",
       " '广州市政府',\n",
       " '中国知网',\n",
       " '广东省教育厅',\n",
       " '珠江投资',\n",
       " '南方论丛',\n",
       " '中山大学 ',\n",
       " '珠江教育联盟']"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# B-B-3 使用 xpath  \n",
    "# greedy 不挑元素/标签  只挑有特定属性title的所有元素\n",
    "r.html.xpath('//*/@title')  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 挑剔ungreedy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['对话校友，嘉木成林——文学与传媒学院校友访谈会顺利举办',\n",
       " '观一代文豪，品苏诗之美——从化区作协主席刘尚阳先生来文学与传媒学院讲学',\n",
       " '墨香淡淡，“书”途同归——文学与传媒学院旧书捐赠活动顺利开展',\n",
       " '大咖行于市井，定格烟火光影——羊晚高级记者叶健强先生来文学与传媒学院精讲新闻摄影',\n",
       " '文学与传媒学院第三届“南方·世新”两岸两校毕业联展暨交流会顺利举行',\n",
       " '传播天下 “媒”梦有你——文学与传媒学院学生优秀作品展开幕式顺利举行',\n",
       " '不忘初心，勇攀高峰——文学与传媒学院团总支以第一名成绩喜提学校“优秀团委（团总支）”称号',\n",
       " '传道授业，提疑解惑——文学与传媒学院期中座谈会新闻学与网络与新媒体专场顺利召开',\n",
       " '文学与传媒学院团学团队参加我校校园文化艺术节闭幕式暨文艺汇演',\n",
       " '文学与传媒学院学生作品在南粤校园中华经典诵读文化艺术节中荣获大学组一等奖',\n",
       " '赛出实力，教出风采——文学与传媒学院第六届教师教学竞赛顺利举行',\n",
       " '文学与传媒学院首届“传播天下，‘媒’梦有你”专业文化活动月开幕式暨第十届最强讲师决赛圆满结束']"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# B-B-4 使用 xpath  # ungreedy 更精準\n",
    "r.html.xpath('//div[@class=\"news_title\"]/a/@title')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<Element 'a' href='/index.php/home/article/search_detail/id/5648.html' target='_blank' title='对话校友，嘉木成林——文学与传媒学院校友访谈会顺利举办'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5625.html' target='_blank' title='观一代文豪，品苏诗之美——从化区作协主席刘尚阳先生来文学与传媒学院讲学'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5632.html' target='_blank' title='墨香淡淡，“书”途同归——文学与传媒学院旧书捐赠活动顺利开展'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5614.html' target='_blank' title='大咖行于市井，定格烟火光影——羊晚高级记者叶健强先生来文学与传媒学院精讲新闻摄影'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5584.html' target='_blank' title='文学与传媒学院第三届“南方·世新”两岸两校毕业联展暨交流会顺利举行'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5523.html' target='_blank' title='传播天下 “媒”梦有你——文学与传媒学院学生优秀作品展开幕式顺利举行'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5508.html' target='_blank' title='不忘初心，勇攀高峰——文学与传媒学院团总支以第一名成绩喜提学校“优秀团委（团总支）”称号'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5509.html' target='_blank' title='传道授业，提疑解惑——文学与传媒学院期中座谈会新闻学与网络与新媒体专场顺利召开'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5511.html' target='_blank' title='文学与传媒学院团学团队参加我校校园文化艺术节闭幕式暨文艺汇演'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5500.html' target='_blank' title='文学与传媒学院学生作品在南粤校园中华经典诵读文化艺术节中荣获大学组一等奖'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5455.html' target='_blank' title='赛出实力，教出风采——文学与传媒学院第六届教师教学竞赛顺利举行'>,\n",
       " <Element 'a' href='/index.php/home/article/search_detail/id/5456.html' target='_blank' title='文学与传媒学院首届“传播天下，‘媒’梦有你”专业文化活动月开幕式暨第十届最强讲师决赛圆满结束'>]"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# B-B-5 使用 xpath  # ungreedy 最精准\n",
    "#   xpath 完全没有用 // 也没有用 * \n",
    "r.html.xpath('body/div[@class=\"lin-content\"]/div[@class=\"lin-neiye clearfix\"]/div[@class=\"search_list_right\"]/div[@class=\"ny_content\"]/ul[@class=\"list-ul\"]/li/div[@class=\"news_title\"]/a')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Xpath Axis](http://krum.rz.uni-mannheim.de/inet-2005/images/xpath-axis.gif)\n",
    "## 更多xpath\n",
    "xpath是一门在XML文档（包括html，以樹狀為主的純本文結構文檔）中查找信息的语言\n",
    "\n",
    "2. 熟悉 [xpath 语法](https://www.w3cschool.cn/xpath/xpath-syntax.html)丶[xpath 节点](https://www.w3cschool.cn/xpath/xpath-nodes.html)\n",
    "    * 节点\n",
    "        * 元素丶属性丶文本丶命名空间丶文档（根）结点\n",
    "    * 节点关系 \n",
    "        * 父母（parent） vs.先辈（ancestor）\n",
    "        * 子女（children） vs. 后代（descendant）\n",
    "        * 同胞（sibling）\n",
    "3. 使用 [xpath cheatsheet](https://devhints.io/xpath)\n",
    "  * 在 Chrome Inspector 使用\n",
    "  * 在 requests-html (Python) 使用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['对话校友，嘉木成林——文学与传媒学院校友访谈会顺利举办', '观一代文豪，品苏诗之美——从化区作协主席刘尚阳先生来文学与传媒学院讲学', '墨香淡淡，“书”途同归——文学与传媒学院旧书捐赠活动顺利开展', '大咖行于市井，定格烟火光影——羊晚高级记者叶健强先生来文学与传媒学院精讲新闻摄影', '文学与传媒学院第三届“南方·世新”两岸两校毕业联展暨交流会顺利举行', '传播天下 “媒”梦有你——文学与传媒学院学生优秀作品展开幕式顺利举行', '不忘初心，勇攀高峰——文学与传媒学院团总支以第一名成绩喜提学校“优秀团委（团总支）”称号', '传道授业，提疑解惑——文学与传媒学院期中座谈会新闻学与网络与新媒体专场顺利召开', '文学与传媒学院团学团队参加我校校园文化艺术节闭幕式暨文艺汇演', '文学与传媒学院学生作品在南粤校园中华经典诵读文化艺术节中荣获大学组一等奖', '赛出实力，教出风采——文学与传媒学院第六届教师教学竞赛顺利举行', '文学与传媒学院首届“传播天下，‘媒’梦有你”专业文化活动月开幕式暨第十届最强讲师决赛圆满结束']\n"
     ]
    }
   ],
   "source": [
    "# B-C-1 \n",
    "print (r.html.xpath('//div[@class=\"news_title\"]/a/@title'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['/index.php/home/article/search_detail/id/5648.html', '/index.php/home/article/search_detail/id/5625.html', '/index.php/home/article/search_detail/id/5632.html', '/index.php/home/article/search_detail/id/5614.html', '/index.php/home/article/search_detail/id/5584.html', '/index.php/home/article/search_detail/id/5523.html', '/index.php/home/article/search_detail/id/5508.html', '/index.php/home/article/search_detail/id/5509.html', '/index.php/home/article/search_detail/id/5511.html', '/index.php/home/article/search_detail/id/5500.html', '/index.php/home/article/search_detail/id/5455.html', '/index.php/home/article/search_detail/id/5456.html']\n"
     ]
    }
   ],
   "source": [
    "# B-C-2\n",
    "print (r.html.xpath('//div[@class=\"news_title\"]/a/@href'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['2019-06-24', '2019-06-20', '2019-06-20', '2019-06-19', '2019-06-13', '2019-05-30', '2019-05-27', '2019-05-27', '2019-05-27', '2019-05-24', '2019-05-17', '2019-05-17']\n"
     ]
    }
   ],
   "source": [
    "# B-C-3\n",
    "print (r.html.xpath('//font[@class=\"right-more\"]/text()'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['2019-06-24', '2019-06-20', '2019-06-20', '2019-06-19', '2019-06-13', '2019-05-30', '2019-05-27', '2019-05-27', '2019-05-27', '2019-05-24', '2019-05-17', '2019-05-17']\n"
     ]
    }
   ],
   "source": [
    "# B-C-4\n",
    "print (r.html.xpath('//div[@class=\"news_title\"]/preceding-sibling::font/text()'))\n",
    "\n",
    "## 廖老师主张这个 B-C-4代码，会比B-C-3更好，你能不能从B-C-1, 及 B-C-2观察xpath语法，猜猜为什麽"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用pandas 輸出xlsx\n",
    "4. 简易使用 [pd.DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>标题</th>\n",
       "      <th>链结</th>\n",
       "      <th>日期</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>对话校友，嘉木成林——文学与传媒学院校友访谈会顺利举办</td>\n",
       "      <td>/index.php/home/article/search_detail/id/5648....</td>\n",
       "      <td>2019-06-24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>观一代文豪，品苏诗之美——从化区作协主席刘尚阳先生来文学与传媒学院讲学</td>\n",
       "      <td>/index.php/home/article/search_detail/id/5625....</td>\n",
       "      <td>2019-06-20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>墨香淡淡，“书”途同归——文学与传媒学院旧书捐赠活动顺利开展</td>\n",
       "      <td>/index.php/home/article/search_detail/id/5632....</td>\n",
       "      <td>2019-06-20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>大咖行于市井，定格烟火光影——羊晚高级记者叶健强先生来文学与传媒学院精讲新闻摄影</td>\n",
       "      <td>/index.php/home/article/search_detail/id/5614....</td>\n",
       "      <td>2019-06-19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>文学与传媒学院第三届“南方·世新”两岸两校毕业联展暨交流会顺利举行</td>\n",
       "      <td>/index.php/home/article/search_detail/id/5584....</td>\n",
       "      <td>2019-06-13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>传播天下 “媒”梦有你——文学与传媒学院学生优秀作品展开幕式顺利举行</td>\n",
       "      <td>/index.php/home/article/search_detail/id/5523....</td>\n",
       "      <td>2019-05-30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>不忘初心，勇攀高峰——文学与传媒学院团总支以第一名成绩喜提学校“优秀团委（团总支）”称号</td>\n",
       "      <td>/index.php/home/article/search_detail/id/5508....</td>\n",
       "      <td>2019-05-27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>传道授业，提疑解惑——文学与传媒学院期中座谈会新闻学与网络与新媒体专场顺利召开</td>\n",
       "      <td>/index.php/home/article/search_detail/id/5509....</td>\n",
       "      <td>2019-05-27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>文学与传媒学院团学团队参加我校校园文化艺术节闭幕式暨文艺汇演</td>\n",
       "      <td>/index.php/home/article/search_detail/id/5511....</td>\n",
       "      <td>2019-05-27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>文学与传媒学院学生作品在南粤校园中华经典诵读文化艺术节中荣获大学组一等奖</td>\n",
       "      <td>/index.php/home/article/search_detail/id/5500....</td>\n",
       "      <td>2019-05-24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>赛出实力，教出风采——文学与传媒学院第六届教师教学竞赛顺利举行</td>\n",
       "      <td>/index.php/home/article/search_detail/id/5455....</td>\n",
       "      <td>2019-05-17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>文学与传媒学院首届“传播天下，‘媒’梦有你”专业文化活动月开幕式暨第十届最强讲师决赛圆满结束</td>\n",
       "      <td>/index.php/home/article/search_detail/id/5456....</td>\n",
       "      <td>2019-05-17</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                标题  \\\n",
       "0                      对话校友，嘉木成林——文学与传媒学院校友访谈会顺利举办   \n",
       "1              观一代文豪，品苏诗之美——从化区作协主席刘尚阳先生来文学与传媒学院讲学   \n",
       "2                   墨香淡淡，“书”途同归——文学与传媒学院旧书捐赠活动顺利开展   \n",
       "3         大咖行于市井，定格烟火光影——羊晚高级记者叶健强先生来文学与传媒学院精讲新闻摄影   \n",
       "4                文学与传媒学院第三届“南方·世新”两岸两校毕业联展暨交流会顺利举行   \n",
       "5               传播天下 “媒”梦有你——文学与传媒学院学生优秀作品展开幕式顺利举行   \n",
       "6     不忘初心，勇攀高峰——文学与传媒学院团总支以第一名成绩喜提学校“优秀团委（团总支）”称号   \n",
       "7          传道授业，提疑解惑——文学与传媒学院期中座谈会新闻学与网络与新媒体专场顺利召开   \n",
       "8                   文学与传媒学院团学团队参加我校校园文化艺术节闭幕式暨文艺汇演   \n",
       "9             文学与传媒学院学生作品在南粤校园中华经典诵读文化艺术节中荣获大学组一等奖   \n",
       "10                 赛出实力，教出风采——文学与传媒学院第六届教师教学竞赛顺利举行   \n",
       "11  文学与传媒学院首届“传播天下，‘媒’梦有你”专业文化活动月开幕式暨第十届最强讲师决赛圆满结束   \n",
       "\n",
       "                                                   链结          日期  \n",
       "0   /index.php/home/article/search_detail/id/5648....  2019-06-24  \n",
       "1   /index.php/home/article/search_detail/id/5625....  2019-06-20  \n",
       "2   /index.php/home/article/search_detail/id/5632....  2019-06-20  \n",
       "3   /index.php/home/article/search_detail/id/5614....  2019-06-19  \n",
       "4   /index.php/home/article/search_detail/id/5584....  2019-06-13  \n",
       "5   /index.php/home/article/search_detail/id/5523....  2019-05-30  \n",
       "6   /index.php/home/article/search_detail/id/5508....  2019-05-27  \n",
       "7   /index.php/home/article/search_detail/id/5509....  2019-05-27  \n",
       "8   /index.php/home/article/search_detail/id/5511....  2019-05-27  \n",
       "9   /index.php/home/article/search_detail/id/5500....  2019-05-24  \n",
       "10  /index.php/home/article/search_detail/id/5455....  2019-05-17  \n",
       "11  /index.php/home/article/search_detail/id/5456....  2019-05-17  "
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# B-D-1 pd.DataFrame 建构，pandas课有教\n",
    "df = pd.DataFrame( {\n",
    "         \"标题\": r.html.xpath('//div[@class=\"news_title\"]/a/@title'),\n",
    "         \"链结\": r.html.xpath('//div[@class=\"news_title\"]/a/@href'),\n",
    "         \"日期\": r.html.xpath('//font[@class=\"right-more\"]/text()'),\n",
    "     } )\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "# B-D-2 pd.DataFrame 输出excel，pandas课有教\n",
    "df.to_excel(\"20春_Web数据挖掘_week02_nfu_文学与传媒学院.xlsx\", sheet_name=\"搜查结果\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 本周小结内容\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 打开Excel档看成果"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 课后练习及下周项目m.liepin.com\n",
    "\n",
    "使用 xpath 应用 [m.liepin.com](https://m.liepin.com/zhaopin/)\n",
    "\n",
    "你是数据科学家，这m.liepin.com有什麽样的牛肉，你打算要怎麽抓？\n",
    "* 工作名称\n",
    "* 工作地点\n",
    "* 工作$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {},
   "outputs": [],
   "source": [
    "# C-1   单一页面\n",
    "url = \"https://m.liepin.com/zhaopin/?keyword=pandas&dqs=000&salarylow=0&salaryhigh=999&industrys=000&compScale=000&compKind=000&pubtime=000&jobkind=&d_headId=f60ea6ccf0d1a048ea4dae4e6772566f&d_ckId=0101a36235d5c8fee9e4dad6d5abbd15&d_sfrom=search_unknown&d_curPage=0&d_pageSize=60&siTag=OkO0-IMl2UAiwO-pwvovWg~0-4CWIKi_yBwz6jEoh_n7w\"\n",
    "session = HTMLSession()\n",
    "r = session.get( url )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "metadata": {},
   "outputs": [],
   "source": [
    "# C-2 保存备用\n",
    "with open (\"20春_Web数据挖掘_week02_zhaopin_pandas.html\", encoding = \"utf8\", mode = \"w\") as fp:\n",
    "    fp.write(r.html.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>职称</th>\n",
       "      <th>链结</th>\n",
       "      <th>薪水</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>pandas招聘专场进行时</td>\n",
       "      <td>/register/?imscid=R000010827</td>\n",
       "      <td>10万起</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>数据分析助理</td>\n",
       "      <td>https://m.liepin.com/job/1926863845.shtml</td>\n",
       "      <td>5-7k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>大数据开发工程师（python方向）</td>\n",
       "      <td>https://m.liepin.com/job/1924369011.shtml</td>\n",
       "      <td>10-15k·17薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Python全栈工程师</td>\n",
       "      <td>https://m.liepin.com/job/1925461677.shtml</td>\n",
       "      <td>15-25k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>数据工程师</td>\n",
       "      <td>https://m.liepin.com/job/1921388735.shtml</td>\n",
       "      <td>10-20k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>数据分析工程师</td>\n",
       "      <td>https://m.liepin.com/job/1926276091.shtml</td>\n",
       "      <td>面议</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>数据分析师</td>\n",
       "      <td>https://m.liepin.com/job/1919747421.shtml</td>\n",
       "      <td>5-7k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>高级建模工程师</td>\n",
       "      <td>https://m.liepin.com/job/1926176307.shtml</td>\n",
       "      <td>30-50k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>pandas招聘专场进行时</td>\n",
       "      <td>/register/?imscid=R000010827</td>\n",
       "      <td>10万起</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>高级数字工程师 Senior Digital Engineer</td>\n",
       "      <td>https://m.liepin.com/a/18914473.shtml</td>\n",
       "      <td>20-23k·13薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>高级建模工程师</td>\n",
       "      <td>https://m.liepin.com/a/18938787.shtml</td>\n",
       "      <td>40-70k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>ETL开发工程师</td>\n",
       "      <td>https://m.liepin.com/job/1917906737.shtml</td>\n",
       "      <td>25-45k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Senior Software Engineer R&amp;D</td>\n",
       "      <td>https://m.liepin.com/job/1921261397.shtml</td>\n",
       "      <td>10-20k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>量化策略研究员（Alpha/日内）</td>\n",
       "      <td>https://m.liepin.com/job/1919242021.shtml</td>\n",
       "      <td>15-35k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>IT Big Data Senior Data Mining Specialist(J221...</td>\n",
       "      <td>https://m.liepin.com/job/1921389841.shtml</td>\n",
       "      <td>面议</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>BI报表开发工程师-初级</td>\n",
       "      <td>https://m.liepin.com/job/1922621879.shtml</td>\n",
       "      <td>8-10k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>python开发工程师</td>\n",
       "      <td>https://m.liepin.com/job/1921991221.shtml</td>\n",
       "      <td>面议</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>数据科学家（金融风控方向） (MJ000216)</td>\n",
       "      <td>https://m.liepin.com/job/1926930637.shtml</td>\n",
       "      <td>24-45k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>股票量化研究员（留用）</td>\n",
       "      <td>https://m.liepin.com/job/1926869697.shtml</td>\n",
       "      <td>2-4k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>python数据工程师</td>\n",
       "      <td>https://m.liepin.com/job/1926839195.shtml</td>\n",
       "      <td>8-15k·13薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Senior Data Scientist - Machine Learning(J10081)</td>\n",
       "      <td>https://m.liepin.com/job/1926660117.shtml</td>\n",
       "      <td>面议</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>pandas招聘专场进行时</td>\n",
       "      <td>/register/?imscid=R000010827</td>\n",
       "      <td>10万起</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>智能运维算法工程师</td>\n",
       "      <td>https://m.liepin.com/job/1926622143.shtml</td>\n",
       "      <td>面议</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>大数据算法开发工程师</td>\n",
       "      <td>https://m.liepin.com/job/1926607011.shtml</td>\n",
       "      <td>面议</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>模型工程师</td>\n",
       "      <td>https://m.liepin.com/job/1926568881.shtml</td>\n",
       "      <td>20-30k·14薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>自动化算法工程师</td>\n",
       "      <td>https://m.liepin.com/job/1926429071.shtml</td>\n",
       "      <td>面议</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>python</td>\n",
       "      <td>https://m.liepin.com/job/1926411279.shtml</td>\n",
       "      <td>12-30k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>python</td>\n",
       "      <td>https://m.liepin.com/job/1926350257.shtml</td>\n",
       "      <td>15-25k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>java架构师</td>\n",
       "      <td>https://m.liepin.com/job/1926350095.shtml</td>\n",
       "      <td>20-29k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>Java架构师</td>\n",
       "      <td>https://m.liepin.com/job/1926262805.shtml</td>\n",
       "      <td>25-35k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>Python Architect/Developer   Python架构师/开发</td>\n",
       "      <td>https://m.liepin.com/job/1926257735.shtml</td>\n",
       "      <td>26-40k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>Python Architect/Developer   Python架构师/开发</td>\n",
       "      <td>https://m.liepin.com/job/1926257733.shtml</td>\n",
       "      <td>26-40k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>Python Architect/Developer   Python架构师/开发</td>\n",
       "      <td>https://m.liepin.com/job/1926257731.shtml</td>\n",
       "      <td>26-40k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>Python Architect/Developer   Python架构师/开发</td>\n",
       "      <td>https://m.liepin.com/job/1926257729.shtml</td>\n",
       "      <td>26-40k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>Python Architect/Developer   Python架构师/开发</td>\n",
       "      <td>https://m.liepin.com/job/1926257727.shtml</td>\n",
       "      <td>26-40k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>Python Architect/Developer   Python架构师/开发</td>\n",
       "      <td>https://m.liepin.com/job/1926257725.shtml</td>\n",
       "      <td>26-40k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>Python Architect/Developer   Python架构师/开发</td>\n",
       "      <td>https://m.liepin.com/job/1926257723.shtml</td>\n",
       "      <td>26-40k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>Python Architect/Developer   Python架构师/开发</td>\n",
       "      <td>https://m.liepin.com/job/1926257721.shtml</td>\n",
       "      <td>26-40k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>Python Architect/Developer   Python架构师/开发</td>\n",
       "      <td>https://m.liepin.com/job/1926257719.shtml</td>\n",
       "      <td>26-40k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>Python Architect/Developer   Python架构师/开发</td>\n",
       "      <td>https://m.liepin.com/job/1926257717.shtml</td>\n",
       "      <td>26-40k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>Python Architect/Developer   Python架构师/开发</td>\n",
       "      <td>https://m.liepin.com/job/1926257715.shtml</td>\n",
       "      <td>26-40k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>python初级开发工程师 python开发 后端开发工程师</td>\n",
       "      <td>https://m.liepin.com/job/1926158241.shtml</td>\n",
       "      <td>7-11k·13薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>python软件开发工程师</td>\n",
       "      <td>https://m.liepin.com/job/1926042123.shtml</td>\n",
       "      <td>12-20k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>量化交易实习生（留用）</td>\n",
       "      <td>https://m.liepin.com/job/1925922457.shtml</td>\n",
       "      <td>2-4k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47</th>\n",
       "      <td>python后端开发</td>\n",
       "      <td>https://m.liepin.com/job/1925839777.shtml</td>\n",
       "      <td>面议</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48</th>\n",
       "      <td>数据开发工程师</td>\n",
       "      <td>https://m.liepin.com/job/1925811589.shtml</td>\n",
       "      <td>10-15k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>NLP工程师</td>\n",
       "      <td>https://m.liepin.com/job/1925516103.shtml</td>\n",
       "      <td>18-25k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50</th>\n",
       "      <td>大数据/AI课程讲师/教研员</td>\n",
       "      <td>https://m.liepin.com/job/1925380633.shtml</td>\n",
       "      <td>10-25k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51</th>\n",
       "      <td>python工程师</td>\n",
       "      <td>https://m.liepin.com/job/1925321953.shtml</td>\n",
       "      <td>15-30k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>52</th>\n",
       "      <td>数据挖掘工程师（高薪好福利）</td>\n",
       "      <td>https://m.liepin.com/job/1924833243.shtml</td>\n",
       "      <td>13-18k·14薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>53</th>\n",
       "      <td>数据挖掘工程师（高薪好发展）</td>\n",
       "      <td>https://m.liepin.com/job/1924833175.shtml</td>\n",
       "      <td>12-18k·14薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>54</th>\n",
       "      <td>数据挖掘工程师</td>\n",
       "      <td>https://m.liepin.com/job/1924830999.shtml</td>\n",
       "      <td>10-16k·14薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>55</th>\n",
       "      <td>Python高级工程师（数据处理）</td>\n",
       "      <td>https://m.liepin.com/job/1924574829.shtml</td>\n",
       "      <td>20-30k·13薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>56</th>\n",
       "      <td>数据分析师</td>\n",
       "      <td>https://m.liepin.com/job/1924531121.shtml</td>\n",
       "      <td>15-20k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>57</th>\n",
       "      <td>大数据开发工程师</td>\n",
       "      <td>https://m.liepin.com/job/1924513877.shtml</td>\n",
       "      <td>20-30k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58</th>\n",
       "      <td>机器学习数据处理工程师</td>\n",
       "      <td>https://m.liepin.com/job/1924392255.shtml</td>\n",
       "      <td>8-16k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>59</th>\n",
       "      <td>量化研究员</td>\n",
       "      <td>https://m.liepin.com/job/1924265823.shtml</td>\n",
       "      <td>12-15k·14薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>60</th>\n",
       "      <td>数据分析工程师</td>\n",
       "      <td>https://m.liepin.com/job/1923135935.shtml</td>\n",
       "      <td>8-15k·13薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>61</th>\n",
       "      <td>数据分析工程师</td>\n",
       "      <td>https://m.liepin.com/job/1922744879.shtml</td>\n",
       "      <td>20-25k·12薪</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>62</th>\n",
       "      <td>数据分析师</td>\n",
       "      <td>https://m.liepin.com/job/1921177929.shtml</td>\n",
       "      <td>20-25k·12薪</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>63 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                   职称  \\\n",
       "0                                       pandas招聘专场进行时   \n",
       "1                                             数据分析助理    \n",
       "2                                 大数据开发工程师（python方向）    \n",
       "3                                        Python全栈工程师    \n",
       "4                                              数据工程师    \n",
       "5                                            数据分析工程师    \n",
       "6                                              数据分析师    \n",
       "7                                            高级建模工程师    \n",
       "8                                       pandas招聘专场进行时   \n",
       "9                    高级数字工程师 Senior Digital Engineer    \n",
       "10                                           高级建模工程师    \n",
       "11                                          ETL开发工程师    \n",
       "12                      Senior Software Engineer R&D    \n",
       "13                                 量化策略研究员（Alpha/日内）    \n",
       "14  IT Big Data Senior Data Mining Specialist(J221...   \n",
       "15                                      BI报表开发工程师-初级    \n",
       "16                                       python开发工程师    \n",
       "17                          数据科学家（金融风控方向） (MJ000216)    \n",
       "18                                       股票量化研究员（留用）    \n",
       "19                                       python数据工程师    \n",
       "20  Senior Data Scientist - Machine Learning(J10081)    \n",
       "21                                      pandas招聘专场进行时   \n",
       "22                                         智能运维算法工程师    \n",
       "23                                        大数据算法开发工程师    \n",
       "24                                             模型工程师    \n",
       "25                                          自动化算法工程师    \n",
       "26                                            python    \n",
       "27                                            python    \n",
       "28                                           java架构师    \n",
       "29                                           Java架构师    \n",
       "..                                                ...   \n",
       "33         Python Architect/Developer   Python架构师/开发    \n",
       "34         Python Architect/Developer   Python架构师/开发    \n",
       "35         Python Architect/Developer   Python架构师/开发    \n",
       "36         Python Architect/Developer   Python架构师/开发    \n",
       "37         Python Architect/Developer   Python架构师/开发    \n",
       "38         Python Architect/Developer   Python架构师/开发    \n",
       "39         Python Architect/Developer   Python架构师/开发    \n",
       "40         Python Architect/Developer   Python架构师/开发    \n",
       "41         Python Architect/Developer   Python架构师/开发    \n",
       "42         Python Architect/Developer   Python架构师/开发    \n",
       "43         Python Architect/Developer   Python架构师/开发    \n",
       "44                    python初级开发工程师 python开发 后端开发工程师    \n",
       "45                                     python软件开发工程师    \n",
       "46                                       量化交易实习生（留用）    \n",
       "47                                        python后端开发    \n",
       "48                                           数据开发工程师    \n",
       "49                                            NLP工程师    \n",
       "50                                    大数据/AI课程讲师/教研员    \n",
       "51                                         python工程师    \n",
       "52                                    数据挖掘工程师（高薪好福利）    \n",
       "53                                    数据挖掘工程师（高薪好发展）    \n",
       "54                                           数据挖掘工程师    \n",
       "55                                 Python高级工程师（数据处理）    \n",
       "56                                             数据分析师    \n",
       "57                                          大数据开发工程师    \n",
       "58                                       机器学习数据处理工程师    \n",
       "59                                             量化研究员    \n",
       "60                                           数据分析工程师    \n",
       "61                                           数据分析工程师    \n",
       "62                                             数据分析师    \n",
       "\n",
       "                                           链结          薪水  \n",
       "0                /register/?imscid=R000010827        10万起  \n",
       "1   https://m.liepin.com/job/1926863845.shtml    5-7k·12薪  \n",
       "2   https://m.liepin.com/job/1924369011.shtml  10-15k·17薪  \n",
       "3   https://m.liepin.com/job/1925461677.shtml  15-25k·12薪  \n",
       "4   https://m.liepin.com/job/1921388735.shtml  10-20k·12薪  \n",
       "5   https://m.liepin.com/job/1926276091.shtml          面议  \n",
       "6   https://m.liepin.com/job/1919747421.shtml    5-7k·12薪  \n",
       "7   https://m.liepin.com/job/1926176307.shtml  30-50k·12薪  \n",
       "8                /register/?imscid=R000010827        10万起  \n",
       "9       https://m.liepin.com/a/18914473.shtml  20-23k·13薪  \n",
       "10      https://m.liepin.com/a/18938787.shtml  40-70k·12薪  \n",
       "11  https://m.liepin.com/job/1917906737.shtml  25-45k·12薪  \n",
       "12  https://m.liepin.com/job/1921261397.shtml  10-20k·12薪  \n",
       "13  https://m.liepin.com/job/1919242021.shtml  15-35k·12薪  \n",
       "14  https://m.liepin.com/job/1921389841.shtml          面议  \n",
       "15  https://m.liepin.com/job/1922621879.shtml   8-10k·12薪  \n",
       "16  https://m.liepin.com/job/1921991221.shtml          面议  \n",
       "17  https://m.liepin.com/job/1926930637.shtml  24-45k·12薪  \n",
       "18  https://m.liepin.com/job/1926869697.shtml    2-4k·12薪  \n",
       "19  https://m.liepin.com/job/1926839195.shtml   8-15k·13薪  \n",
       "20  https://m.liepin.com/job/1926660117.shtml          面议  \n",
       "21               /register/?imscid=R000010827        10万起  \n",
       "22  https://m.liepin.com/job/1926622143.shtml          面议  \n",
       "23  https://m.liepin.com/job/1926607011.shtml          面议  \n",
       "24  https://m.liepin.com/job/1926568881.shtml  20-30k·14薪  \n",
       "25  https://m.liepin.com/job/1926429071.shtml          面议  \n",
       "26  https://m.liepin.com/job/1926411279.shtml  12-30k·12薪  \n",
       "27  https://m.liepin.com/job/1926350257.shtml  15-25k·12薪  \n",
       "28  https://m.liepin.com/job/1926350095.shtml  20-29k·12薪  \n",
       "29  https://m.liepin.com/job/1926262805.shtml  25-35k·12薪  \n",
       "..                                        ...         ...  \n",
       "33  https://m.liepin.com/job/1926257735.shtml  26-40k·12薪  \n",
       "34  https://m.liepin.com/job/1926257733.shtml  26-40k·12薪  \n",
       "35  https://m.liepin.com/job/1926257731.shtml  26-40k·12薪  \n",
       "36  https://m.liepin.com/job/1926257729.shtml  26-40k·12薪  \n",
       "37  https://m.liepin.com/job/1926257727.shtml  26-40k·12薪  \n",
       "38  https://m.liepin.com/job/1926257725.shtml  26-40k·12薪  \n",
       "39  https://m.liepin.com/job/1926257723.shtml  26-40k·12薪  \n",
       "40  https://m.liepin.com/job/1926257721.shtml  26-40k·12薪  \n",
       "41  https://m.liepin.com/job/1926257719.shtml  26-40k·12薪  \n",
       "42  https://m.liepin.com/job/1926257717.shtml  26-40k·12薪  \n",
       "43  https://m.liepin.com/job/1926257715.shtml  26-40k·12薪  \n",
       "44  https://m.liepin.com/job/1926158241.shtml   7-11k·13薪  \n",
       "45  https://m.liepin.com/job/1926042123.shtml  12-20k·12薪  \n",
       "46  https://m.liepin.com/job/1925922457.shtml    2-4k·12薪  \n",
       "47  https://m.liepin.com/job/1925839777.shtml          面议  \n",
       "48  https://m.liepin.com/job/1925811589.shtml  10-15k·12薪  \n",
       "49  https://m.liepin.com/job/1925516103.shtml  18-25k·12薪  \n",
       "50  https://m.liepin.com/job/1925380633.shtml  10-25k·12薪  \n",
       "51  https://m.liepin.com/job/1925321953.shtml  15-30k·12薪  \n",
       "52  https://m.liepin.com/job/1924833243.shtml  13-18k·14薪  \n",
       "53  https://m.liepin.com/job/1924833175.shtml  12-18k·14薪  \n",
       "54  https://m.liepin.com/job/1924830999.shtml  10-16k·14薪  \n",
       "55  https://m.liepin.com/job/1924574829.shtml  20-30k·13薪  \n",
       "56  https://m.liepin.com/job/1924531121.shtml  15-20k·12薪  \n",
       "57  https://m.liepin.com/job/1924513877.shtml  20-30k·12薪  \n",
       "58  https://m.liepin.com/job/1924392255.shtml   8-16k·12薪  \n",
       "59  https://m.liepin.com/job/1924265823.shtml  12-15k·14薪  \n",
       "60  https://m.liepin.com/job/1923135935.shtml   8-15k·13薪  \n",
       "61  https://m.liepin.com/job/1922744879.shtml  20-25k·12薪  \n",
       "62  https://m.liepin.com/job/1921177929.shtml  20-25k·12薪  \n",
       "\n",
       "[63 rows x 3 columns]"
      ]
     },
     "execution_count": 116,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# C-3\n",
    "# 易: '职称', '链结', '薪水'\n",
    "df = pd.DataFrame( {\n",
    "         \"职称\": r.html.xpath('//ul/li/a/span/text()'),\n",
    "         \"链结\": r.html.xpath('//div[@class=\"job-card\"]/dl/dd/ul/li/a[@class=\"flex-2 job-name\"]/@href'),\n",
    "         \"薪水\": r.html.xpath('//div[@class=\"job-card\"]/dl/dd/ul/li/span[@class=\"text-warning flex-1\"]/text()'),\n",
    "     } )\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "60\n",
      "60\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>公司地点</th>\n",
       "      <th>公司名称</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>广州</td>\n",
       "      <td>深圳市智灵时代科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>武汉-关山</td>\n",
       "      <td>武汉中海庭数据技术有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>北京</td>\n",
       "      <td>因诺(上海)资产管理有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>北京</td>\n",
       "      <td>因诺(上海)资产管理有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>武汉-江夏区</td>\n",
       "      <td>武汉彼欧英瑞杰汽车系统有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>成都-太升路</td>\n",
       "      <td>易居企业（中国）集团有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>杭州-滨江区</td>\n",
       "      <td>信泊尔企业管理咨询(上海)有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>北京</td>\n",
       "      <td>某外资环境能源公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>杭州</td>\n",
       "      <td>超头部数据智能供应链公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>北京</td>\n",
       "      <td>爱笔(北京)智能科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>上海</td>\n",
       "      <td>维塔士</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>北京-朝阳区</td>\n",
       "      <td>北京和正投资管理有限责任公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>武汉-流芳</td>\n",
       "      <td>捷信消费金融有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>武汉-武昌区</td>\n",
       "      <td>上海罗盘信息科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>北京</td>\n",
       "      <td>新聚思北京</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>北京</td>\n",
       "      <td>北京奥米智信科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>上海</td>\n",
       "      <td>上海凯纳璞淳资产管理有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>上海</td>\n",
       "      <td>上海软科教育信息咨询有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>上海-浦东新区</td>\n",
       "      <td>德比软件</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>上海-浦东新区</td>\n",
       "      <td>深圳平安综合金融服务有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>上海</td>\n",
       "      <td>上海数岳信息科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>北京-西城区</td>\n",
       "      <td>北京知因智慧科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>深圳</td>\n",
       "      <td>深圳埃克斯工业自动化有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>深圳</td>\n",
       "      <td>软通动力</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>深圳</td>\n",
       "      <td>软通动力</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>深圳</td>\n",
       "      <td>软通动力</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>深圳</td>\n",
       "      <td>软通动力</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>长沙</td>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>西安</td>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>青岛</td>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>大连</td>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>天津</td>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>苏州</td>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>南京</td>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>深圳</td>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>广州</td>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>武汉</td>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>杭州</td>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>成都</td>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>上海</td>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>北京</td>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>深圳-科技园</td>\n",
       "      <td>深圳米筐科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>上海</td>\n",
       "      <td>翼健(上海)信息科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>上海</td>\n",
       "      <td>上海凯纳璞淳资产管理有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>深圳-福田区</td>\n",
       "      <td>深圳平安综合金融服务有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>上海-静安区</td>\n",
       "      <td>深圳市今古科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>深圳</td>\n",
       "      <td>深圳市前海唯艾咨询服务有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47</th>\n",
       "      <td>北京-北下关</td>\n",
       "      <td>北京伟东教育科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48</th>\n",
       "      <td>深圳-龙华区</td>\n",
       "      <td>安吉康尔(深圳)科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>杭州</td>\n",
       "      <td>量知</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50</th>\n",
       "      <td>杭州</td>\n",
       "      <td>量知</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51</th>\n",
       "      <td>杭州</td>\n",
       "      <td>量知</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>52</th>\n",
       "      <td>上海-徐汇区</td>\n",
       "      <td>眼控科技</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>53</th>\n",
       "      <td>北京</td>\n",
       "      <td>北京集奥聚合</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>54</th>\n",
       "      <td>上海-徐汇区</td>\n",
       "      <td>上海牙木通讯技术有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>55</th>\n",
       "      <td>北京-海淀区</td>\n",
       "      <td>瑞鑫天算</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>56</th>\n",
       "      <td>北京-广渠门</td>\n",
       "      <td>上海富唐资产管理有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>57</th>\n",
       "      <td>广州-天河区</td>\n",
       "      <td>广东蔚海数问大数据科技有限公司</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58</th>\n",
       "      <td>北京-海淀区</td>\n",
       "      <td>微加普惠</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>59</th>\n",
       "      <td>北京-万寿寺</td>\n",
       "      <td>微加普惠</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       公司地点               公司名称\n",
       "0        广州      深圳市智灵时代科技有限公司\n",
       "1     武汉-关山      武汉中海庭数据技术有限公司\n",
       "2        北京     因诺(上海)资产管理有限公司\n",
       "3        北京     因诺(上海)资产管理有限公司\n",
       "4    武汉-江夏区    武汉彼欧英瑞杰汽车系统有限公司\n",
       "5    成都-太升路     易居企业（中国）集团有限公司\n",
       "6    杭州-滨江区  信泊尔企业管理咨询(上海)有限公司\n",
       "7        北京          某外资环境能源公司\n",
       "8        杭州       超头部数据智能供应链公司\n",
       "9        北京     爱笔(北京)智能科技有限公司\n",
       "10       上海                维塔士\n",
       "11   北京-朝阳区     北京和正投资管理有限责任公司\n",
       "12    武汉-流芳         捷信消费金融有限公司\n",
       "13   武汉-武昌区       上海罗盘信息科技有限公司\n",
       "14       北京              新聚思北京\n",
       "15       北京       北京奥米智信科技有限公司\n",
       "16       上海     上海凯纳璞淳资产管理有限公司\n",
       "17       上海     上海软科教育信息咨询有限公司\n",
       "18  上海-浦东新区               德比软件\n",
       "19  上海-浦东新区     深圳平安综合金融服务有限公司\n",
       "20       上海       上海数岳信息科技有限公司\n",
       "21   北京-西城区       北京知因智慧科技有限公司\n",
       "22       深圳     深圳埃克斯工业自动化有限公司\n",
       "23       深圳               软通动力\n",
       "24       深圳               软通动力\n",
       "25       深圳               软通动力\n",
       "26       深圳               软通动力\n",
       "27       长沙         天津恒程科技有限公司\n",
       "28       西安         天津恒程科技有限公司\n",
       "29       青岛         天津恒程科技有限公司\n",
       "30       大连         天津恒程科技有限公司\n",
       "31       天津         天津恒程科技有限公司\n",
       "32       苏州         天津恒程科技有限公司\n",
       "33       南京         天津恒程科技有限公司\n",
       "34       深圳         天津恒程科技有限公司\n",
       "35       广州         天津恒程科技有限公司\n",
       "36       武汉         天津恒程科技有限公司\n",
       "37       杭州         天津恒程科技有限公司\n",
       "38       成都         天津恒程科技有限公司\n",
       "39       上海         天津恒程科技有限公司\n",
       "40       北京         天津恒程科技有限公司\n",
       "41   深圳-科技园         深圳米筐科技有限公司\n",
       "42       上海     翼健(上海)信息科技有限公司\n",
       "43       上海     上海凯纳璞淳资产管理有限公司\n",
       "44   深圳-福田区     深圳平安综合金融服务有限公司\n",
       "45   上海-静安区        深圳市今古科技有限公司\n",
       "46       深圳    深圳市前海唯艾咨询服务有限公司\n",
       "47   北京-北下关       北京伟东教育科技有限公司\n",
       "48   深圳-龙华区     安吉康尔(深圳)科技有限公司\n",
       "49       杭州                 量知\n",
       "50       杭州                 量知\n",
       "51       杭州                 量知\n",
       "52   上海-徐汇区               眼控科技\n",
       "53       北京             北京集奥聚合\n",
       "54   上海-徐汇区       上海牙木通讯技术有限公司\n",
       "55   北京-海淀区               瑞鑫天算\n",
       "56   北京-广渠门       上海富唐资产管理有限公司\n",
       "57   广州-天河区    广东蔚海数问大数据科技有限公司\n",
       "58   北京-海淀区               微加普惠\n",
       "59   北京-万寿寺               微加普惠"
      ]
     },
     "execution_count": 112,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# C-4\n",
    "# 中: '公司地点', '公司名称'\n",
    "\n",
    "# 你的代码?\n",
    "\n",
    "# 获取公司地点\n",
    "company_place= r.html.xpath('//dd[@class=\"right-info\"]/ul/li[3]/a/text()')\n",
    "while '不限' in a:\n",
    "    a.remove('不限')\n",
    "print(len(a))\n",
    "\n",
    "# 获取公司名称\n",
    "company_name=r.html.xpath('//div[@class=\"job-card\"]/dl/dd/ul/li[2]/a/text()')\n",
    "print(len(company_name))\n",
    "\n",
    "数据 = pd.DataFrame({\n",
    "        \"公司地点\": a,\n",
    "        \"公司名称\": r.html.xpath('//div[@class=\"job-card\"]/dl/dd/ul/li[2]/a/text()'),\n",
    "        })\n",
    "数据\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>公司名称</th>\n",
       "      <th>公司URL</th>\n",
       "      <th>时间</th>\n",
       "      <th>经验</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>深圳市智灵时代科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/9793933/}</td>\n",
       "      <td>12小时前</td>\n",
       "      <td>经验不限 学历不限</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>武汉中海庭数据技术有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/9644059/}</td>\n",
       "      <td>5小时前</td>\n",
       "      <td>5年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>因诺(上海)资产管理有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/8537745/}</td>\n",
       "      <td>前天</td>\n",
       "      <td>2年以上 硕士及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>因诺(上海)资产管理有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/8537745/}</td>\n",
       "      <td>前天</td>\n",
       "      <td>1年以上 硕士及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>武汉彼欧英瑞杰汽车系统有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/9360320/}</td>\n",
       "      <td>2020-03-13</td>\n",
       "      <td>1年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>易居企业（中国）集团有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/2283683/}</td>\n",
       "      <td>2020-03-02</td>\n",
       "      <td>1年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>信泊尔企业管理咨询(上海)有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/9962617/}</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>某外资环境能源公司</td>\n",
       "      <td>{}</td>\n",
       "      <td>2020-03-18</td>\n",
       "      <td>5年以上 统招本科</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>超头部数据智能供应链公司</td>\n",
       "      <td>{}</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>2年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>爱笔(北京)智能科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/9402103/}</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>5年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>维塔士</td>\n",
       "      <td>{https://m.liepin.com/company/3215924/}</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>1年以上 学历不限</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>北京和正投资管理有限责任公司</td>\n",
       "      <td>{https://m.liepin.com/company/8677826/}</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>2年以上 硕士及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>捷信消费金融有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/8660469/}</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>5年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>上海罗盘信息科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/8473643/}</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>1年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>新聚思北京</td>\n",
       "      <td>{https://m.liepin.com/company/519268/}</td>\n",
       "      <td>一个月前</td>\n",
       "      <td>3年以上 统招本科</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>北京奥米智信科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/12156569/}</td>\n",
       "      <td>16小时前</td>\n",
       "      <td>经验不限 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>上海凯纳璞淳资产管理有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/9816201/}</td>\n",
       "      <td>10小时前</td>\n",
       "      <td>经验不限 硕士及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>上海软科教育信息咨询有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/8160826/}</td>\n",
       "      <td>3小时前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>德比软件</td>\n",
       "      <td>{https://m.liepin.com/company/6959941/}</td>\n",
       "      <td>11小时前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>深圳平安综合金融服务有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/7956861/}</td>\n",
       "      <td>7小时前</td>\n",
       "      <td>3年以上 统招本科</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>上海数岳信息科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/10275779/}</td>\n",
       "      <td>9小时前</td>\n",
       "      <td>2年以上 统招本科</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>北京知因智慧科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/9314063/}</td>\n",
       "      <td>3小时前</td>\n",
       "      <td>3年以上 统招本科</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>深圳埃克斯工业自动化有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/9510654/}</td>\n",
       "      <td>6小时前</td>\n",
       "      <td>3年以上 硕士及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>软通动力</td>\n",
       "      <td>{https://m.liepin.com/company/7865459/}</td>\n",
       "      <td>7小时前</td>\n",
       "      <td>3年以上 统招本科</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>软通动力</td>\n",
       "      <td>{https://m.liepin.com/company/7865459/}</td>\n",
       "      <td>11小时前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>软通动力</td>\n",
       "      <td>{https://m.liepin.com/company/7865459/}</td>\n",
       "      <td>11小时前</td>\n",
       "      <td>5年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>软通动力</td>\n",
       "      <td>{https://m.liepin.com/company/7865459/}</td>\n",
       "      <td>7小时前</td>\n",
       "      <td>5年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/10023177/}</td>\n",
       "      <td>2小时前</td>\n",
       "      <td>5年以上 学历不限</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/10023177/}</td>\n",
       "      <td>2小时前</td>\n",
       "      <td>5年以上 学历不限</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/10023177/}</td>\n",
       "      <td>2小时前</td>\n",
       "      <td>5年以上 学历不限</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/10023177/}</td>\n",
       "      <td>2小时前</td>\n",
       "      <td>5年以上 学历不限</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/10023177/}</td>\n",
       "      <td>2小时前</td>\n",
       "      <td>5年以上 学历不限</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/10023177/}</td>\n",
       "      <td>2小时前</td>\n",
       "      <td>5年以上 学历不限</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/10023177/}</td>\n",
       "      <td>2小时前</td>\n",
       "      <td>5年以上 学历不限</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/10023177/}</td>\n",
       "      <td>2小时前</td>\n",
       "      <td>5年以上 学历不限</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/10023177/}</td>\n",
       "      <td>2小时前</td>\n",
       "      <td>5年以上 学历不限</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/10023177/}</td>\n",
       "      <td>2小时前</td>\n",
       "      <td>5年以上 学历不限</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/10023177/}</td>\n",
       "      <td>2小时前</td>\n",
       "      <td>5年以上 学历不限</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/10023177/}</td>\n",
       "      <td>2小时前</td>\n",
       "      <td>5年以上 学历不限</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/10023177/}</td>\n",
       "      <td>2小时前</td>\n",
       "      <td>5年以上 学历不限</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>天津恒程科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/10023177/}</td>\n",
       "      <td>2小时前</td>\n",
       "      <td>5年以上 学历不限</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>深圳米筐科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/8527953/}</td>\n",
       "      <td>11小时前</td>\n",
       "      <td>1年以上 统招本科</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>翼健(上海)信息科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/9415155/}</td>\n",
       "      <td>10小时前</td>\n",
       "      <td>经验不限 统招本科</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>上海凯纳璞淳资产管理有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/9816201/}</td>\n",
       "      <td>10小时前</td>\n",
       "      <td>经验不限 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>深圳平安综合金融服务有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/7956861/}</td>\n",
       "      <td>6小时前</td>\n",
       "      <td>3年以上 统招本科</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>深圳市今古科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/9313211/}</td>\n",
       "      <td>11小时前</td>\n",
       "      <td>1年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>深圳市前海唯艾咨询服务有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/10079733/}</td>\n",
       "      <td>11小时前</td>\n",
       "      <td>2年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47</th>\n",
       "      <td>北京伟东教育科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/9807023/}</td>\n",
       "      <td>11小时前</td>\n",
       "      <td>5年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48</th>\n",
       "      <td>安吉康尔(深圳)科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/9127399/}</td>\n",
       "      <td>12小时前</td>\n",
       "      <td>5年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>量知</td>\n",
       "      <td>{https://m.liepin.com/company/8744113/}</td>\n",
       "      <td>6小时前</td>\n",
       "      <td>1年以上 统招本科</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50</th>\n",
       "      <td>量知</td>\n",
       "      <td>{https://m.liepin.com/company/8744113/}</td>\n",
       "      <td>6小时前</td>\n",
       "      <td>1年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51</th>\n",
       "      <td>量知</td>\n",
       "      <td>{https://m.liepin.com/company/8744113/}</td>\n",
       "      <td>6小时前</td>\n",
       "      <td>1年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>52</th>\n",
       "      <td>眼控科技</td>\n",
       "      <td>{https://m.liepin.com/company/8511023/}</td>\n",
       "      <td>11小时前</td>\n",
       "      <td>5年以上 统招本科</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>53</th>\n",
       "      <td>北京集奥聚合</td>\n",
       "      <td>{https://m.liepin.com/company/6143905/}</td>\n",
       "      <td>4小时前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>54</th>\n",
       "      <td>上海牙木通讯技术有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/8053602/}</td>\n",
       "      <td>8小时前</td>\n",
       "      <td>2年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>55</th>\n",
       "      <td>瑞鑫天算</td>\n",
       "      <td>{https://m.liepin.com/company/10140641/}</td>\n",
       "      <td>5小时前</td>\n",
       "      <td>5年以上 硕士及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>56</th>\n",
       "      <td>上海富唐资产管理有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/10225577/}</td>\n",
       "      <td>12小时前</td>\n",
       "      <td>经验不限 统招本科</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>57</th>\n",
       "      <td>广东蔚海数问大数据科技有限公司</td>\n",
       "      <td>{https://m.liepin.com/company/9777417/}</td>\n",
       "      <td>6小时前</td>\n",
       "      <td>2年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58</th>\n",
       "      <td>微加普惠</td>\n",
       "      <td>{https://m.liepin.com/company/9724239/}</td>\n",
       "      <td>5小时前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>59</th>\n",
       "      <td>微加普惠</td>\n",
       "      <td>{https://m.liepin.com/company/9724239/}</td>\n",
       "      <td>5小时前</td>\n",
       "      <td>3年以上 本科及以上</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 公司名称                                     公司URL          时间  \\\n",
       "0       深圳市智灵时代科技有限公司   {https://m.liepin.com/company/9793933/}       12小时前   \n",
       "1       武汉中海庭数据技术有限公司   {https://m.liepin.com/company/9644059/}        5小时前   \n",
       "2      因诺(上海)资产管理有限公司   {https://m.liepin.com/company/8537745/}          前天   \n",
       "3      因诺(上海)资产管理有限公司   {https://m.liepin.com/company/8537745/}          前天   \n",
       "4     武汉彼欧英瑞杰汽车系统有限公司   {https://m.liepin.com/company/9360320/}  2020-03-13   \n",
       "5      易居企业（中国）集团有限公司   {https://m.liepin.com/company/2283683/}  2020-03-02   \n",
       "6   信泊尔企业管理咨询(上海)有限公司   {https://m.liepin.com/company/9962617/}        一个月前   \n",
       "7           某外资环境能源公司                                        {}  2020-03-18   \n",
       "8        超头部数据智能供应链公司                                        {}        一个月前   \n",
       "9      爱笔(北京)智能科技有限公司   {https://m.liepin.com/company/9402103/}        一个月前   \n",
       "10                维塔士   {https://m.liepin.com/company/3215924/}        一个月前   \n",
       "11     北京和正投资管理有限责任公司   {https://m.liepin.com/company/8677826/}        一个月前   \n",
       "12         捷信消费金融有限公司   {https://m.liepin.com/company/8660469/}        一个月前   \n",
       "13       上海罗盘信息科技有限公司   {https://m.liepin.com/company/8473643/}        一个月前   \n",
       "14              新聚思北京    {https://m.liepin.com/company/519268/}        一个月前   \n",
       "15       北京奥米智信科技有限公司  {https://m.liepin.com/company/12156569/}       16小时前   \n",
       "16     上海凯纳璞淳资产管理有限公司   {https://m.liepin.com/company/9816201/}       10小时前   \n",
       "17     上海软科教育信息咨询有限公司   {https://m.liepin.com/company/8160826/}        3小时前   \n",
       "18               德比软件   {https://m.liepin.com/company/6959941/}       11小时前   \n",
       "19     深圳平安综合金融服务有限公司   {https://m.liepin.com/company/7956861/}        7小时前   \n",
       "20       上海数岳信息科技有限公司  {https://m.liepin.com/company/10275779/}        9小时前   \n",
       "21       北京知因智慧科技有限公司   {https://m.liepin.com/company/9314063/}        3小时前   \n",
       "22     深圳埃克斯工业自动化有限公司   {https://m.liepin.com/company/9510654/}        6小时前   \n",
       "23               软通动力   {https://m.liepin.com/company/7865459/}        7小时前   \n",
       "24               软通动力   {https://m.liepin.com/company/7865459/}       11小时前   \n",
       "25               软通动力   {https://m.liepin.com/company/7865459/}       11小时前   \n",
       "26               软通动力   {https://m.liepin.com/company/7865459/}        7小时前   \n",
       "27         天津恒程科技有限公司  {https://m.liepin.com/company/10023177/}        2小时前   \n",
       "28         天津恒程科技有限公司  {https://m.liepin.com/company/10023177/}        2小时前   \n",
       "29         天津恒程科技有限公司  {https://m.liepin.com/company/10023177/}        2小时前   \n",
       "30         天津恒程科技有限公司  {https://m.liepin.com/company/10023177/}        2小时前   \n",
       "31         天津恒程科技有限公司  {https://m.liepin.com/company/10023177/}        2小时前   \n",
       "32         天津恒程科技有限公司  {https://m.liepin.com/company/10023177/}        2小时前   \n",
       "33         天津恒程科技有限公司  {https://m.liepin.com/company/10023177/}        2小时前   \n",
       "34         天津恒程科技有限公司  {https://m.liepin.com/company/10023177/}        2小时前   \n",
       "35         天津恒程科技有限公司  {https://m.liepin.com/company/10023177/}        2小时前   \n",
       "36         天津恒程科技有限公司  {https://m.liepin.com/company/10023177/}        2小时前   \n",
       "37         天津恒程科技有限公司  {https://m.liepin.com/company/10023177/}        2小时前   \n",
       "38         天津恒程科技有限公司  {https://m.liepin.com/company/10023177/}        2小时前   \n",
       "39         天津恒程科技有限公司  {https://m.liepin.com/company/10023177/}        2小时前   \n",
       "40         天津恒程科技有限公司  {https://m.liepin.com/company/10023177/}        2小时前   \n",
       "41         深圳米筐科技有限公司   {https://m.liepin.com/company/8527953/}       11小时前   \n",
       "42     翼健(上海)信息科技有限公司   {https://m.liepin.com/company/9415155/}       10小时前   \n",
       "43     上海凯纳璞淳资产管理有限公司   {https://m.liepin.com/company/9816201/}       10小时前   \n",
       "44     深圳平安综合金融服务有限公司   {https://m.liepin.com/company/7956861/}        6小时前   \n",
       "45        深圳市今古科技有限公司   {https://m.liepin.com/company/9313211/}       11小时前   \n",
       "46    深圳市前海唯艾咨询服务有限公司  {https://m.liepin.com/company/10079733/}       11小时前   \n",
       "47       北京伟东教育科技有限公司   {https://m.liepin.com/company/9807023/}       11小时前   \n",
       "48     安吉康尔(深圳)科技有限公司   {https://m.liepin.com/company/9127399/}       12小时前   \n",
       "49                 量知   {https://m.liepin.com/company/8744113/}        6小时前   \n",
       "50                 量知   {https://m.liepin.com/company/8744113/}        6小时前   \n",
       "51                 量知   {https://m.liepin.com/company/8744113/}        6小时前   \n",
       "52               眼控科技   {https://m.liepin.com/company/8511023/}       11小时前   \n",
       "53             北京集奥聚合   {https://m.liepin.com/company/6143905/}        4小时前   \n",
       "54       上海牙木通讯技术有限公司   {https://m.liepin.com/company/8053602/}        8小时前   \n",
       "55               瑞鑫天算  {https://m.liepin.com/company/10140641/}        5小时前   \n",
       "56       上海富唐资产管理有限公司  {https://m.liepin.com/company/10225577/}       12小时前   \n",
       "57    广东蔚海数问大数据科技有限公司   {https://m.liepin.com/company/9777417/}        6小时前   \n",
       "58               微加普惠   {https://m.liepin.com/company/9724239/}        5小时前   \n",
       "59               微加普惠   {https://m.liepin.com/company/9724239/}        5小时前   \n",
       "\n",
       "            经验  \n",
       "0    经验不限 学历不限  \n",
       "1   5年以上 本科及以上  \n",
       "2   2年以上 硕士及以上  \n",
       "3   1年以上 硕士及以上  \n",
       "4   1年以上 本科及以上  \n",
       "5   1年以上 本科及以上  \n",
       "6   3年以上 本科及以上  \n",
       "7    5年以上 统招本科  \n",
       "8   2年以上 本科及以上  \n",
       "9   5年以上 本科及以上  \n",
       "10   1年以上 学历不限  \n",
       "11  2年以上 硕士及以上  \n",
       "12  5年以上 本科及以上  \n",
       "13  1年以上 本科及以上  \n",
       "14   3年以上 统招本科  \n",
       "15  经验不限 本科及以上  \n",
       "16  经验不限 硕士及以上  \n",
       "17  3年以上 本科及以上  \n",
       "18  3年以上 本科及以上  \n",
       "19   3年以上 统招本科  \n",
       "20   2年以上 统招本科  \n",
       "21   3年以上 统招本科  \n",
       "22  3年以上 硕士及以上  \n",
       "23   3年以上 统招本科  \n",
       "24  3年以上 本科及以上  \n",
       "25  5年以上 本科及以上  \n",
       "26  5年以上 本科及以上  \n",
       "27   5年以上 学历不限  \n",
       "28   5年以上 学历不限  \n",
       "29   5年以上 学历不限  \n",
       "30   5年以上 学历不限  \n",
       "31   5年以上 学历不限  \n",
       "32   5年以上 学历不限  \n",
       "33   5年以上 学历不限  \n",
       "34   5年以上 学历不限  \n",
       "35   5年以上 学历不限  \n",
       "36   5年以上 学历不限  \n",
       "37   5年以上 学历不限  \n",
       "38   5年以上 学历不限  \n",
       "39   5年以上 学历不限  \n",
       "40   5年以上 学历不限  \n",
       "41   1年以上 统招本科  \n",
       "42   经验不限 统招本科  \n",
       "43  经验不限 本科及以上  \n",
       "44   3年以上 统招本科  \n",
       "45  1年以上 本科及以上  \n",
       "46  2年以上 本科及以上  \n",
       "47  5年以上 本科及以上  \n",
       "48  5年以上 本科及以上  \n",
       "49   1年以上 统招本科  \n",
       "50  1年以上 本科及以上  \n",
       "51  1年以上 本科及以上  \n",
       "52   5年以上 统招本科  \n",
       "53  3年以上 本科及以上  \n",
       "54  2年以上 本科及以上  \n",
       "55  5年以上 硕士及以上  \n",
       "56   经验不限 统招本科  \n",
       "57  2年以上 本科及以上  \n",
       "58  3年以上 本科及以上  \n",
       "59  3年以上 本科及以上  "
      ]
     },
     "execution_count": 113,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# C-5\n",
    "# 难: '公司URL', '时间', '经验'\n",
    "\n",
    "from requests_html import HTMLSession\n",
    "url = \"https://m.liepin.com/zhaopin/?keyword=pandas&dqs=000&salarylow=0&salaryhigh=999&industrys=000&compScale=000&compKind=000&pubtime=000&jobkind=&d_headId=f60ea6ccf0d1a048ea4dae4e6772566f&d_ckId=0101a36235d5c8fee9e4dad6d5abbd15&d_sfrom=search_unknown&d_curPage=0&d_pageSize=60&siTag=OkO0-IMl2UAiwO-pwvovWg~0-4CWIKi_yBwz6jEoh_n7w\"\n",
    "session = HTMLSession()\n",
    "import pandas as pd\n",
    "r = session.get( url )\n",
    "\n",
    "\n",
    "# 获取时间\n",
    "job_time= r.html.xpath('//dd[@class=\"right-info\"]/ul/li[3]/time/text()')\n",
    "# 删去推广广告\n",
    "while '1分钟前' in job_time:\n",
    "    job_time.remove('1分钟前')\n",
    "# print(len(job_time))\n",
    "\n",
    "\n",
    "news = r.html.xpath('//dd[@class=\"right-info\"]/ul/li[2]/a')\n",
    "# print(news)\n",
    "\n",
    "# 获得公司名称和公司URL\n",
    "company_name=[]\n",
    "company_url=[]\n",
    "for i in news:\n",
    "    company_name.append(i.text)  # \n",
    "    company_url.append(i.absolute_links)  # 获得工作链接\n",
    "# print(company_name)\n",
    "# print(company_url)\n",
    "\n",
    "\n",
    "# 获得经验\n",
    "job_exp_bad=r.html.xpath('//dd[@class=\"right-info\"]/ul/li[3]/text()')\n",
    "\n",
    "# 删除\"\\n\"\n",
    "job_exp = [x.strip() for x in job_exp_bad if x.strip() != '']\n",
    "ad_text='学历不限\\xa0经验不限'\n",
    "while ad_text in job_exp:\n",
    "    job_exp.remove(ad_text)\n",
    "# print(len(job_exp))\n",
    "# print(job_exp)\n",
    "\n",
    "数据C5 = pd.DataFrame({\n",
    "        \"公司名称\":company_name,\n",
    "        \"公司URL\": company_url,\n",
    "        \"时间\": job_time,\n",
    "        \"经验\": job_exp,\n",
    "        })\n",
    "\n",
    "数据C5.to_excel(\"20春_Web数据挖掘_week02_liepin.xlsx\", sheet_name=\"搜查结果\")\n",
    "数据C5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "# ⬇️以下为操作经历\n",
    "### 供自己看"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "# # while True:\n",
    "# name = r.html.xpath('//div[@class=\"job-card\"]/dl/dd/ul/li[2]/a/text()')\n",
    "# x=[]\n",
    "# for i in name:\n",
    "#     if '某' in i:\n",
    "#         del i\n",
    "#     elif '知' in i:\n",
    "#         del i\n",
    "#     elif '北极' in i:\n",
    "#         del i\n",
    "#     elif '初' in i:\n",
    "#         del i\n",
    "#     elif '平' in i:\n",
    "#         del i\n",
    "#     else:\n",
    "#        x.append(i) \n",
    "# print(len(x))\n",
    "# #     x = i.split[0]\n",
    "# #     while '某' in oo:\n",
    "# #         name.remove('某')\n",
    "# # name\n",
    "# #     url = r.html.xpath('//dd[@class=\"right-info\"]/ul/li[2]/a/@href')\n",
    "# # print(len(url))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# c_url = r.html.xpath('//dd[@class=\"right-info\"]/ul/li[2]/a/@href|//div[@class=\"job-card\"]/dl/dd/ul/li[2]/a/text()')\n",
    "# print(c_url)\n",
    "\n",
    "# if\n",
    "# name=r.html.xpath('//div[@class=\"job-card\"]/dl/dd/ul/li[2]/a/text()')\n",
    "# c_name=[]\n",
    "# for i in c_url:\n",
    "#     print(i)\n",
    "#     if :\n",
    "#         c_name.append(r.html.xpath('//div[@class=\"job-card\"]/dl/dd/ul/li[2]/a/text()'))\n",
    "# c_name"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "749px",
    "left": "1125.609375px",
    "top": "110px",
    "width": "281.389px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
